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Existing  0-0  DBMSs  lack  a  solid  mathematical  foundation  for  the  manipulation 
of  O-O  databases,  optimization  of  queries,  and  the  design  and  selection  of  storage 
structures  for  supporting  O-O  database  manipulations.  An  association  algebra  (A- 
algebra)  is  prescribed  for  serving  as  a  mathematical  foundation  for  processing  0-0 
databases,  which  is  analogous  to  the  use  of  relational  algebra  for  processing  relational 
databases.  In  this  algebra,  objects  and  their  associations  in  an  0-0  database  are  uni- 
formly represented  by  association  patterns  which  are  manipulated  by  a  number  of 
operators  to  produce  other  association  patterns.  Different  from  the  relational  alge- 
bra, in  which  set  operations  operate  on  relations  with  union-compatible  structures, 
the  A-algebra  operators  can  operate  on  association  patterns  of  both  homogeneous  and 
heterogeneous  structures.  Different  from  the  traditional  record-based  relational  pro- 
cessing, the  A-algebra  allows  very  complex  patterns  of  object  associations  to  be 
directly  manipulated.  Pattern-based  query  formulation  and  the  A-algebra  operators 
are    described.     Some    mathematical    properties    of    the    algebraic    operators    are 
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presented  together  with  their  application  in  query  decomposition  and  optimization. 
The  completeness  of  the  A-algebra  is  also  defined  and  proven.  The  A-algebra  has 
been  used  as  the  basis  for  the  design  and  implementation  of  an  object-oriented  query 
language,  OQL,  which  is  the  query  language  used  in  a  prototype  Knowledge  Base 
Management  System  OSAM*.KBMS. 


... 
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CHAPTER  1 
INTRODUCTION 

In  the  past  two  decades,  techniques  of  data  modeling  have  gone  through  two 
major  conceptual  changes.    First,  in  early  1970s,  E.  F.  Codd  observed  that  future 
database  systems  should  allow  application  programs  and   terminal  users  to  remain 
unaffected  by  changes  made  to  the  internal  data  representation  (or  the  storage 
structure)  of  a  database.    He  introduced  the  relational  data  model  [COD70]  and 
proposed    the    relational    algebra    and    relational    calculus    [COD72a]     as    the 
mathematical  foundation  for  processing  relational  databases.   The  relational  model 
provides  two  levels  of  data  independence  in  a  three-level  architecture  for  a  data- 
base management  system  as  shown  in  Figure  1.1  (figures  of  each  chapter  are 
placed  at  the  end  of  the  chapter).  At  the  lower  level,  the  physical  data  indepen- 
dence is  provided,  i.e.,  the  logical  representation  of  a  relational  database  is  a  set  of 
relations  (i.e.,  flat  tables),  which  is  independent  of  the  physical  (data  and  storage) 
structures  in  which  data  are  stored.   At  the  higher  level,  the  logical  data  indepen- 
dence is  provided,  i.e.,  the  external  view  remains  unchanged  when  the  logical  view 
of  a  database  is  modified  (note  that  the  external  view  remains  unchanged  only  for 
some   schema   modifications).     Besides   simple    logical    representation    and   data 
independence,  the  fact  that  the  relational  model  has  a  solid  mathematical  founda- 
tion is  very  important  and  has  contributed  to  the  success  of  the  model  and  the 
existing   relational  database  management  systems. 


However,  the  relational  model  and  relational  systems  have  some  limitations. 
For  example,  the  model  captures  rather  limited  structural  properties  of  real-world 
entities  or  objects.  The  construct  of  aggregation  hierarchy  which  models  complex 
objects  and  the  construct  of  generalization  which  models  the  superclass-subclass 
relationship  are  not  provided.  In  the  relational  model,  data  which  describe  a  com- 
plex object  are  scattered  among  a  number  of  normalized  relations  and  accessing 
that  data  involves  time-consuming  traversal  and  assembly  of  data  stored  in  multi- 
ple relations.  The  model  also  does  not  allow  behavioral  properties  of 
entities/objects  to  be  explicitly  defined. 

The  second  conceptual  change  of  data  modeling  techniques  occurred  in  the 
early  1980s.  The  object-oriented  paradigm,  first  introduced  in  the  programming 
language  SIMULA  [DAH67]  and  made  very  popular  through  the  language 
SMALLTALK  [GOL81],  allows  richer  structural  constructs  and  behavioral  proper- 
ties of  objects  to  be  specified  at  the  logical  level  independent  of  their  physical 
implementations.  Several  features  of  the  paradigm  such  as  abstract  data  types, 
inheritance,  encapsulation,  information  hiding,  polymorphism,  etc.  have  been 
shown  to  be  useful  for  data  modeling  and  system  development.  The  object  encap- 
sulation concept  adds  a  level  of  data  independence  between  the  physical  and  the 
logical  independences  introduced  in  the  relational  model,  as  depicted  in  Figure  1.2. 
It  requires  that  the  structural  and  behavioral  properties  of  an  object  be  (logically) 
encapsulated  in  its  class  in  the  conceptual  view  of  an  O-O  database.  Since  then,  a 
number  of  Object-Oriented  (O-O)  and  semantic  data  models  have  been  proposed 
[HAM81,  BAT84,  KIN84,   ZAN85a,   ZAN85b,  DAD86,  MAI86,  MAN86,  SU86, 


ZD086,  WOE86,  BAN87,  FIS87,  HOR87,  HUL87,  KIM87,  ROW87,  CAR88, 
COL89,  SU89],  which  offer  more  powerful  constructs  for  modeling  the  structural 
and  behavioral  properties  of  objects  found  in  advanced  applications  such  as 
CAD/CAM,  CASE,  and  decision  support  systems. 

An  0-0  semantic  data  model  can  be  structurally  and/or  behaviorally  object- 
oriented  [DIT86].  A  structurally  0-0  data  model  is  one  that  encompasses  at  least 
the  following  characteristics: 

(1)  It  supports  the  unique  identification  of  objects,  that  is,  each  object  has  a 
unique  object  identifier  (surrogate)  which  is  valid  for  the  life-time  of  the 
object. 

(2)  It  categorizes  those  objects  which  can  be  described  by  the  same  set  of  charac- 
teristics  (attributes)  into  an  object  class. 

(3)  It  allows  aggregation  (association)  hierarchies  to  be  defined. 

(4)  It  allows  generalization  (association)  hierarchies  to  be  defined. 
The  0-0  view  of  an  application  world  is  represented  in  the  form  of  a  net- 
work of  classes  and  associations.  Object  class  can  be  either  a  primitive-class  whose 
instances  are  of  simple  data  types  (e.g.,  string,  integer)  or  a  nonprimitive  class 
(e.g.,  Part,  Student,  Teacher).  At  the  extensional  level,  instances  of  different 
classes  can  be  related  (associated)  with  each  other  forming  patterns  of  object  asso- 
ciations. A  behaviorally  object-oriented  data  model,  on  the  other  hand,  is  one  in 
which  operations  that  describe  the  behavior  of  the  objects  of  a  class  can  be  defined 
and  registered  with  that  class.  Programs  or  methods  that  implement  the  opera- 
tions defined  for  an  object  are  transparent  to  the  user  of  the  objects. 


For  these  models  to  be  truly  useful,  they  must  provide  some  object  manipula- 
tion languages,  which  can  take  advantage  of  the  expressive  power  of  the  models 
and  provide  the  users  with  simple  and  powerful  querying  facilities.  Recently, 
several  query  languages  such  as  DAPLAX  [SHI81],  GEM  [ZAN83,  TSU84],  ARIEL 
[MAC85],  FAD  [BAN87],  POSTQUEL  [ROW87],  EXCESS  [CAR88],  and  others 
reported  in  [DAD86,  MAN86,  SER86,  BAN87,  FIS87,  BAN88,  COL89,  SHA90] 
have  been  proposed.  These  languages  were  developed  based  on  different  para- 
digms. For  example,  DAPLAX  and  the  query  language  of  [MAN86]  are  based  on 
the  functional  paradigm.  The  query  language  of  [BAN88]  is  based  on  the  message 
passing  paradigm.  Other  query  languages  are  based  on  the  relational  paradigm: 
an  extension  of  QUEL  [ROW87,  CAR88];  an  extension  of  SQL  [DAD86];  and  an 
extension  of  the  relational  algebra  [COL89].  The  query  language  of  [FIS87]  is 
based  on  both  functional  and  relational  paradigms,  allowing  functions  to  be  used 
in  object-oriented  SQL  (OSQL)  constructs. 

The  above  languages  have  an  0-0  flavor  and  have  taken  significant  steps 
towards  the  development  of  a  powerful  0-0  query  language.  Query  languages 
such  as  DAPLAX  [SHI81],  GEM  [ZAN83],  ARIEL  [MAC85],  and  the  object- 
oriented  query  language  described  in  [BAN88],  are  based  on  the  view  of  a  data- 
base defined  in  terms  of  objects,  object  classes,  and  their  associations.  A  query  in 
these  languages  is  formulated  by  specifying  one  class  (usually  a  nonprimitive-class, 
whose  instances  are  real  world  objects)  in  the  schema  as  a  central  class  with  some 
path  expressions.  Each  path  expression  starts  from  the  central  class  and  ends  at 
another  class  (usually  a  primitive-class,  whose  instances  are  of  basic  data  types 


such  as  integer,  string,  set,  etc.).  A  restriction  condition  can  be  specified  on  the 
class  referenced  at  the  end  of  a  path  expression.  This  class  can  also  be  specified  in 
the  list  of  attributes  to  be  retrieved.  The  result  of  a  query  is  a  set  of  tuples,  each 
of  which  corresponds  to  a  single  instance  of  the  central  class  and  contains  values 
related  to  that  instance  which  are  collected  from  classes  specified  in  the  list. 

A  major  drawback  of  these  query  languages  is  that  they  do  not  maintain  the 
closure  property  [ALA89b].  A  query  language  is  said  to  be  closed  if  the  result  of  a 
query  can  be  further  queried  by  other  queries  specified  in  the  same  language.  In 
the  above  mentioned  languages,  the  input  to  a  query  has  an  O-O  representation 
(i.e.,  a  network  of  objects,  classes,  and  their  associations)  whereas  its  output  is  a 
relation  which  does  not  have  the  same  structural  and  behavioral  properties  as  the 
original  objects.  Consequently,  the  result  of  a  query  cannot  be  further  processed 
by  the  same  set  of  operators.  The  design  of  these  languages  is  very  much 
influenced  by  the  relational  model  and  relational  languages  which  are  concerned 
mainly  with  retrieval  and  storage  operations.  In  0-0  processing,  objects  in 
different  classes  that  satisfy  some  search  conditions  are  subject  to  different  user- 
defined  operations.  The  idea  of  collecting  data  to  form  a  resulting  relation  does 
not  satisfy  this  processing  model. 

The  query  languages  proposed  [DAD86,  MAN86,  BAN87,  ROW87,  CAR88, 
COL89]  use  nested  relations  as  their  logical  views  of  0-0  databases.  Although 
these  languages  are  closed,  i.e.,  operators  in  these  languages  operate  on  nested 
relations  to  produce  nested  relations,  the  nested  relation  is  not  a  proper  logical 
representation  for  an  O-O  database  which  is    basically  a  network  structure  of 
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object  associations.    Mapping  from  a  network  representation  to  nested  relations  is 
an  additional  process.   Furthermore,  in  order  to  use  a  nested  relation  to  represent 
complex  network  structures,  a  considerable  amount  of  data  has  to  be  introduced 
to  relate  these  nested  relations.    It  is  our  view  that  the  query  language  and  its 
underlying  algebra  should  directly  support  the  manipulation  of  network  structures. 
A  query  algebra  [SHA90]  was  proposed  recently  based  on  the  0-0  model 
ENCORE   [ELM89].    Although  ENCORE  models  applications  as  networks  of 
objects,  object  types,  and  their  associations,  the  domain  of  the  algebra  is  defined 
as  sets  of  objects  of  the   Tuple  type,  which  is  essentially  the  nested  relation 
representation  since  it  allows  the  nesting  of  tuples.    Therefore,  the  mapping  prob- 
lem addressed  above  still  remains.    In  this  algebra,  two  identical  queries  or  two 
identical  operations  in  a  single  query  do  not  give  the  same  response,  since  each 
produces  a  new  object  in  the  database.    To  eliminate  duplicated  copies  of  the 
same  newly  created  object,  the  algebra  introduces  operations  like  DupEliminate 
and  Coalesce,  which  would  not  have  been  necessary  if  the  algebra  were  to  directly 
support  the  network-structured  processing  of  0-0  databases.   We  further  observe 
that  the  union  operation  in  this  algebra  may  produce  a  collection  of  objects  having 
the  same  data  type  but  with  different  structures  (e.g.,  the  union  of  two  collections 
of  objects  of  the    Tuple  type  with  different  arities).   Nevertheless,  the  other  opera- 
tors introduced  in  the  algebra  are  not  defined  to  operate  on  collection  of  objects 
with  heterogeneous  structures. 

A  common  limitation  of  many  existing  query  languages  is  that  they  cannot 
express  "non-association"  relationship  between  objects  easily,  i.e.,  identify  objects 


in  two  classes  that  are  not  associated  with  each  other  while  their  classes  are.  For 
example,  in  an  O-O  database,  let  us  assume  that  Suppliers  si  and  s2  supply  Parts 
pi  and  p2,  respectively.  GEM,  POSTQUEL,  and  several  other  query  languages 
provide  the  "dot"  construct  (SuppliersParts)  and  ARIEL  provides  the  "of"  con- 
struct (Parts  of  Suppliers)  to  navigate  from  the  class  Suppliers  to  the  class  Parts 
to  produce  object  pairs  (si, pi  and  s2,p2).  However,  they  do  not  have  a  language 
construct  for  specifying  the  semantics  that  si  does  not  supply  p2  and  s2  does  not 
supply  pi.  Similarly,  in  functional  languages,  only  the  function  Parts(Suppliers)  is 
provided  to  specify  the  associations  of  si, pi  and  s2,p2  but  not  the  non-association 
of  suppliers  and  parts. 

In  view  of  the  disadvantages  of  the  existing  0-0  query  languages,  we  would 
like  to  stress  the  importance  of  using  a  graph  as  the  logical  representation  of  an 
O-O  database  at  both  intensional  and  extensional  levels  as  exemplified  by  02 
[LEC88],  FAD  [BAN87],  and  OSAM*  [SU89].  The  query  language  and  its  under- 
lying algebra  should  provide  constructs  to  directly  process  graphs  with  different 
degrees  of  complexity.  They  should  also  support  the  specification  of  non- 
associations  and  the  processing  of  heterogeneous  structures.  Furthermore,  the  clo- 
sure property  should  be  maintained. 

In  this  dissertation,  we  propose  an  association  algebra  (A-algebra)  based  on 
the  graph  representation  of  O-O  databases  and  the  association-based  query  formu- 
lation (refer  to  Chapter  3).  Analogous  to  the  development  of  the  relational  alge- 
bra for  relational  databases,  the  development  of  the  A-algebra  provides  the  formal 
foundation  for  query  processing   and  optimization  in   O-O   databases     and  for 


designing  O-O  query  languages.    Unlike  the  record(tuple)-based  relational  algebra 
[COD70    and    COD72]     and    the    query    algebra    [SHA90],    the    A-algebra    is 
association-based,  i.e.,  the  domain  of  the  algebra  is  sets  of  association  patterns 
(e.g.,  linear  structures,  trees,  lattices,  networks,  etc.)  and  processing  an  0-0  data- 
base is  based  on  the  matching  and  manipulation  of  homogeneous  as  well  as  hetero- 
geneous   patterns  of  object  associations.    Operators  of  the  A-algebra  can  be  used 
to  navigate  a  network  of  interconnected  object  classes  along  the  path  of  interest  to 
construct  a  complex  pattern  as  the  search  condition.    They  can  also  be  used  to 
decompose   a  complicated  pattern  into  simple  ones.    Ten  operators  have  been 
defined  for  the  algebra:  three  unary  operators  [A-Select  (a),  A-Project  (if),  and  A- 
Integrate  (/)],  and  seven  binary  operators  [Associate  (*),  A-Complement  (|),  A- 
Union  (+),  A-Difference  (-),  A-Divide  (^),  NonAssociate  (!),  and  A-Intersect  (•)], 
where  the  prefix   A  stands  for  "Association".    Although  many  of  these  operators 
correspond  to  the  relational  algebra  operators,    they  are  different  from  them  in 
that  they  can  operate  on  complicated  heterogeneous  structures.    In  this  respect, 
the  A-algebra  is  more  general  than  the  relational  algebra. 

The  rest  of  this  dissertation  is  organized  as  follows.  A  detailed  survey  on  the 
relational  model  and  the  relational  algebra,  the  existing  O-O  query  languages,  and 
a  recently  proposed  query  algebra  is  provided  in  Chapter  2.  The  graphical 
representation  of  O-O  databases  and  the  association-based  query  formulation  are 
described  in  Chapter  3  with  the  help  of  examples.  Chapter  4  formally  defines  the 
concepts  of  Schema  Graph  (SG),  Object  Graph  (OG),  and  association  patterns. 
The  formal  definitions  of  the  association  operators  and  their  simple  mathematical 
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properties  are  also  presented.  The  A-algebra  expressions  for  some  example  queries 
are  given  to  demonstrate  the  utility  of  the  algebra.  Chapter  5  presents  the 
mathematical  properties  of  the  association  operators  and  their  utilities  in  query 
optimization  and  query  decomposition.  The  proofs  of  the  mathematical  properties 
of  the  operators  can  be  found  in  the  Appendix.  The  completeness  of  the  A- 
algebra  is  shown  in  Chapter  6  and  the  conclusion  is  given  in  Chapter  7. 
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Figure  1.1   Data  independencies  in  relational  databases 
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Figure  1.2  Architecture  of  O-O  databases 


CHAPTER  2 
A  SURVEY  OF  RELATED  RESEARCH 


This  section  surveys  some  of  the  existing  work  related  to  the  development  of 
the  A-algebra.  Section  2.1  describes  the  relational  model  and  the  relational  alge- 
bra, while  Section  2.2  surveys  some  existing  query  languages  designed  for  0-0 
semantic  data  models.  The  query  algebra  recently  appeared  in  the  literature  is 
surveyed  in  Section  2.3. 

2J Relational  Model  and  Relational  Algebra. 

When  the  hierarchical  and  network  data  models  were  used  extensively  in 
information  systems  in  the  late  1960s,  Codd  [COD70]  raised  an  interesting  and 
important  question:  Can  application  programs  and  terminal  activities  remain 
invariant  as  the  internal  data  representations  (physical  representations)  change? 
He  asserted  that  the  future  users  of  large  data  banks  must  be  protected  from  hav- 
ing to  know  how  the  data  were  organized  in  the  machine.  Following  this 
rationale,  he  conceived  the  notion  of  data  independence  which  suggests  that  the 
logical  organization  of  data  should  be  independent  of  its  physical  representation. 
Determined  to  demonstrate  the  validity  of  his  data  independence  concept,  he  pro- 
posed a  relational  data  model  based  on  n-ary  relations. 
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The  scheme  of  a  relation,  R,  of  an  entity  set  {Ev  E2,  ...,  En}  is  defined  on  a 
set  of  m  attributes  {Av  A2,  ...,  Am}  which  correspond  to  m  domains 
{Dv  D2,  ...,Dm}  (not  necessarily  distinct).  Each  entity  (the  instance  of  the  scheme) 
is  represented  by  an  m-ary  tuple  which  has  its  first  attribute  value  from  Dv  its 
second  attribute  from  Dv  and  so  forth.  A  set  of  attributes  of  a  relation  is  called  a 
key  if  the  entities  of  the  relation  can  be  uniquely  identified  by  the  values  of  these 
attributes. 

In  particular,  the  information  of  the  suppliers  such  as  their  names,  addresses, 
items  they  supply,  and  the  prices  of  the  items  can  be  represented  by  the  relation 
SUPPLIERS  of  the  following  scheme 

SUPPLIERS(SNAME,  SADDRESS,  ITEM,  PRICE) 

where  the  attributes  SNAME  and  ITEM  form  a  composite  key.  Data  represented 
in  this  form,  which  intuitively  is  a  fiat  table,  is  the  logical  view  of  an  application 
world.  It  has  nothing  to  do  with  the  physical  representation  of  the  data. 

When  designing  a  database  using  the  relational  model,  one  is  often  faced  with 
a  choice  among  alternative  sets  of  relation  schemes.  Some  choices  are  more  favor- 
able than  others  for  various  reasons.  For  example,  the  relation  SUPPLIERS  is  not 
a  desirable  scheme  because  it  has  the  following  potential  problems:  (1)  Redun- 
dancy —  the  address  of  the  supplier  is  repeated  once  for  each  item  supplied.  (2) 
Potential  inconsistency  (update  anomalies)  —  as  a  consequence  of  the  redundancy, 
the  update  of  the  address  of  a  supplier  in  one  tuple  will  leave  it  inconsistent  with 
the  address  of  another  tuple.  (3)  Insertion  anomalies  --  the  address  of  a  supplier 
cannot  be  recorded  if  that  supplier  does  not  currently  supply  at  least  one  item 
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since  SNAME  and  ITEM  form  a  composite  key  of  the  relation  SUPPLIERS.  (4) 
Deletion  anomalies  —  the  inverse  to  problem  (3)  is  that  should  all  the  items  sup- 
plied by  one  supplier  be  deleted,  we  unintentionally  lose  the  address  of  that  sup- 
plier. 

The  causes  of  these  problems  and  their  solutions  are  relevant  to  the  func- 
tional dependencies  among  the  attributes  of  a  relation  [COD70,  ULL82].  Suppose 
X  and  Y  are  two  sets  of  attributes  of  a  relation.  Y  functionally  depends  on  X  (or 
X  functionally  determines  Y),  denoted  by  X—*Y,  if  two  tuples  of  the  relation  hav- 
ing the  same  values  in  attributes  X  agree  on  the  values  of  the  attributes  in  Y. 
The  above  four  problems  emerge  if  X—*Y  and  Xf-*Z  hold  simultaneously,  where 
Xt  stands  for  a  proper  subset  of  X  and  Z  a  set  of  attributes  of  the  relation. 

The  solution  to  these  problems  is  to  decompose  a  relation  based  on  the  func- 
tional dependencies  among  attributes.  For  example,  the  functional  dependencies 
among  attributes  of  the  relation  SUPPLIERS  are  (SNAME,ITEM)^PRICE  and 
SNAME— »-SADDRESS,  thereby  having  the  redundancy,  update,  insertion,  and 
deletion  anomalies.  It  should  be  clear  to  the  reader  that  these  problems  will  be 
eliminated  if  the  relation  SUPPLIERS  is  decomposed  into  two  relations 


SA(SNAME,  SADDRESS)  and 
SIP(SNAME,  ITEM,  PRICE). 


There  is,  however,  a  disadvantage  to  the  above  decomposition;  to  find  the  address 
of  a  supplier  who  supplies  item  "piston",  a  join  operation,  has  to  be  applied  since 
the  SADDRESS  and  ITEM  are  logically  distributed  in  two  relations. 
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The  decomposition  of  a  relation  based  on  the  functional  dependencies  among 
its  attributes  is  a  novel  issue  of  normalization  in  the  relational  model.  Four  types 
of  normal  forms,  denoted  by  INF,  2NF,  3NF,  and  Boyee-Codd-NF,  respectively, 
have  been  recognized  in  considering  the  functional  dependency  [COD70,  ARM74, 
and  BEE77].  The  Boyee-Codd-NF  is  the  strongest  of  these  normal  forms.  Rela- 
tions in  these  normal  forms  may  have  to  be  further  decomposed  into  4NF  or  5NF 
to  eliminate  multivalued  dependencies  [FAG77,  DEL78,  and  ZAN76]  and  join 
dependencies  [AH079].  This  decomposition  is  needed  to  eliminate  further  redun- 
dancy and  anomalies. 

The  success  and  popularity  of  the  relational  model  and  the  relational  data- 
base management  systems  (DBMSs)  are  due  to  its  simplicity  in  structural  (tabular) 
representation  and  its  sound  theoretical  basis  —  the  relational  algebra  and  the  rela- 
tional calculus  [COD72a].  The  relational  algebra  defines  five  primitive  operators, 
of  which  two  are  unary  operators  [Projection  (77)  and  Selection  (a)]  and  three  are 
binary  operators  [Cross-product  (x),  Union  (+),  and  Difference  (-)].  Other  opera- 
tors such  as  Join,  Natural-join,  Set-intersection,  and  Set-division  are  also  defined 
in  the  algebra.  Although  these  later  operators  are  easy  to  use,  they  are  not  primi- 
tive since  they  can  be  expressed  in  terms  of  the  primitive  operators. 

The  relational  algebra  has  the  closure  property,  since  every  operator  must 
operate  on  one  or  more  relations  and  produces  a  new  relation.  Operators  of  the 
relational  algebra  basically  operate  on  the  values  of  tuples  in  relations.  Structur- 
ally speaking,  they  are  defined  to  operate  on  tuples  whose  structures  are  union- 
compatible  (homogeneous).    The  relational  algebra  is  complete  in  the  sense  that  it 
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has  the  equivalent  expressive  power  to  the  relational  calculus  [COD72a  and 
ULL82].  Because  of  this,  it  serves  as  the  theoretical  basis  for  the  relational  model. 
The  relational  algebra  has  been  used  for  the  following  three  purposes,  although  it 
has  not  been  previously  implemented  in  any  existing  DBMSs  exactly  as  defined 
[ULL82], 

(1)  It  creates  a  new  class  of  query  languages  called  algebraic  languages.  Based  on 
the  relational  algebra,  languages  that  directly  adopt  the  relational  operators 
can  be  developed,  such  as  ISBL  [TOD76]  which  is  a  close  approximation  to  the 
relational  algebra.  Although  languages  of  this  type  are  mostly  procedural,  it  is 
relatively  easy  to  demonstrate  their  completeness  along  with  the  mathematical 
properties  of  the  relational  algebra  which  can  be  readily  applied  to  query 
optimization  and  query  decomposition. 

(2)  It  not  only  serves  as  a  benchmark  for  evaluating  query  languages  in  existing 
systems,  but  also  as  the  criterion  for  designing  new  languages  for  relational 
DBMSs.  A  relational  language  will  not  have  the  necessary  expressive  power  if 
it  is  not  relationally  complete  [ULL82]. 

(3)  It  provides  a  mathematical  basis  for  transforming  expressions  in  query  decom- 
position and  (logical  or  conceptual)  query  optimization.  As  an  algebra  form, 
the  mathematical  properties  of  the  relational  algebra  can  be  explored  precisely 
and  systematically.  For  query  languages  construed  as  algebraic  languages, 
these  mathematical  properties  exhibit  a  straightforward  application  [HAL76]. 
Query  languages  like  SQUARE  or  SEQUEL  having  certain  algebraic  features 
may  also  use  these  properties,  since  the  parse  of  a  query  yields  a  tree  in  which 
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some  nodes  represent  relational  algebra  operators  [AST76].    Even  if  a  query 
language  such  as  QUEL   is  a  relational   calculus  language,   its  calculus-like 
expressions  are  translated  into  relational  algebra  expressions  in  the  QUEL 
optimizer  [WON76]. 
The  total  content  proposed  by  Codd  before  1979  on  the  relational  model  is 
refered  as  Version  1  of  the  relational  model  (RM/Vl),  whose  modeling  capabilities 
were  extended  by  Codd  in  1979  [COD79]  to  version  RM/T  (T  for  Tasmania). 
Based  on  these  two  versions,  Codd  [COD90]  introduces  Version  2  of  the  relational 
model  (RM/V2).    The  most  important  additional  features  in  RM/V2  are  as  fol- 
lows: 

(1)  A  new  treatment  of  items  of  data  missing  because  they  represent  properties 
that  happen  to  be  inapplicable  to  certain  object  instances. 

(2)  New  features  supporting  all  kinds  of  integrity  constraints,  especially  the  user- 
defined  integrity  constraints. 

(3)  A  more  detailed  account  of  view  updatability. 

(4)  New  features  pertaining  to  the  management  of  distributed  databases. 

It  is  important  to  recognize  the  fact  that  hierarchical  and  network  models  as 
well  as  the  relational  model  evolved  during  a  time  in  which  the  primary  applica- 
tions of  information  systems  were  business-oriented.  In  an  attempt  to  apply  these 
techniques  to  the  more  complicated  application  areas  such  as  CAD/CAM,  CASE, 
and  decision  support,  it  is  found  that  the  relational  model  is  no  longer  adequate 
for  modeling  these  advanced  applications.  The  inadequacies  of  the  relational 
model  are  summarized  as  follows.   First,  the  relational  model  has  limited  modeling 
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capabilities.  When  data  are  logically  represented  in  the  form  of  relations,  the  rela- 
tionships among  entities  in  these  relations  are  represented  by  matching  values  of 
the  attributes  or  keys  in  one  relation  with  values  of  the  attributes  or  foreign  keys 
in  other  relations.  The  actual  semantics  among  the  data  such  as  generalization 
and  aggregation  (the  abstract  data  type)  cannot  be  modeled  by  the  relational 
model.  Second,  the  relational  model  only  models  the  structural  aspects  of  entities, 
and  thus,  ignores  their  behavioral  aspects  (e.g.,  system-defined  and  user-defined 
operations).  Third,  in  these  advanced  applications,  the  concept  of  data  indepen- 
dence should  be  further  extended  to  the  concept  of  object  encapsulation,  i.e.,  not 
only  should  the  logical  representation  of  an  object  be  separated  from  its  physical 
representation,  but  its  structural  and  behavioral  properties  should  be  logically 
encapsulated  in  its  class.  The  object  encapsulation  concept  cannot  be  realized  in 
the  relational  model,  since  the  data  describing  an  entity  may  be  logically  scattered 
among  several  relations  due  to  normalization  [COD70,  COD72b,  BEE77,  and 
ULL82].  Fourth,  entities  with  complex  structures  and  complicated  relationships 
among  entities  are  not  representable  by  flat  tables  (relations).  Finally,  it  cannot 
represent  and  operate  on  entities  with  different  (heterogeneous)  structures. 

2L2 Existing  O-O  Query  Languages 

An  extensive  literature  search  on  query  languages  for  accessing  O-O  data- 
bases such  as  GEM  [ZAN83,  TSU84],  ARIEL  [MAC85],  DAPLEX  [SHI81],  FAD 
[BAN87],  POSTQUEL  [ROW87],  EXCESS  [CAR88],  as  well  as  other  proposed 
languages   [ST084,  DAD86,  MAN86,  SER86,  BAN87,  FIS87,  BAN88,  COL89, 
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SHA90]  has  been  carried  out.  This  section  surveys  a  representative  sample  of 
these  languages.  Most  existing  query  languages  have  capabilities  beyond  those 
provided  by  its  theoretical  basis.  For  example,  the  arithmetic  operations  and 
aggregation  functions  provided  by  the  relational  languages  are  not  available  in  the 
relational  algebra.  Therefore,  this  survey  is  limited  to  those  features  which  are 
relevant  to  the  proposed  algebra. 

To  demonstrate  the  similarities  and  differences  of  these  languages,  the  same 
database  schema  as  shown  in  Figure  2.1  is  used  for  example  queries  written  in 
GEM,  ARIEL,  DAPLEX.  The  sample  schema  of  Figure  2.1  is  for  a  government 
owned  laboratory  system  where  rectangles  represent  classes  and  edges  (links) 
represent  attributes. 

QUEL  [ST076,  WON76,  and  Z0077]  is  a  tuple-calculus  oriented  query 
language  for  relational  DBMS  INGRES  [ST076].  In  order  to  avoid  the  ambiguity 
which  arises  when  two  attributes  of  different  relations  having  the  same  name  are 
addressed  in  a  single  query,  QUEL  uses  a  "dot"  mechanism  to  qualify  an  attribute 
of  a  relation  (i.e.,  a  dot  is  inserted  between  the  name  of  the  relation  and  the  name 
of  the  attribute).  For  example,  Equipment.Name  refers  to  the  attribute  Name  of 
the  relation  Equipment.  Influenced  by  this  mechanism,  the  existing  0-0  query 
languages  use  similar  notations  for  navigating  the  database  schema  from  one  class 
to  another  or  from  one  relation  to  other  relations  in  systems  which  use  relational 
databases  as  their  back-ends. 

The  language  GEM  [ZAN83,TSU84]  is  an  extension  of  QUEL  for  the  data 
model  DSIS  which  supports  aggregation,  generalization,  and  unique  identification 
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of  objects.  In  GEM,  a  class  in  an  aggregation  hierarchy  that  has  a  link  emanating 
to  another  class  has  the  name  of  the  later  class  as  the  data  type  of  one  of  its  attri- 
bute. For  example,  the  class  Lab  has  an  attribute,  Facility,  of  the  type  Equip- 
ment, and  has  another  attribute,  Locality,  of  the  type  Location,  and  so  forth.  The 
dot  notation  is  used  in  GEM  for  navigating  along  the  reference  attributes  (links)  in 
query  formulation.  The  following  GEM  query  retrieves  the  name  of  the  manager, 
the  serial  number  of  the  equipment,  and  the  address  for  each  laboratory  whose 
headquarter  is  located  in  New  York. 


Range  of  Lab  is  Lab 
Retrieve  Lab.Manager.Name 

Lab.Equipment.Serial# 

L  ab  .Loc  ation.  Address 
Where  Lab.Manager.Department.Headquarters.City  =  "New  York" 


This  query  returns  a  set  of  tuples  in  a  tabular  form.  Each  tuple  contains 
values  for  the  manager's  name,  the  equipment  serial  number,  and  the  address  of 
the  laboratory  of  interest. 

In  the  approach  described  in  Stonebraker  et  al.  [ST084],  the  dot  notation  is 
used  in  a  manner  similar  to  that  found  in  GEM  to  implement  the  abstract  data 
type  (ADT)  concept.  In  addition,  QUEL  is  used  as  a  data  type  to  facilitate  the 
navigation  from  one  relation  to  another.  A  relation  may  have  a  field  of  type 
QUEL  which  may  contain  expressions  or  commands  (queries).  Whenever  the  field 
is  addressed  in  a  query,  these  expressions,  in  whole  or  in  part,  will  be  activated. 
In  general,  if  X  is  the  tuple  variable  of  the  relation  Rl,  Y  is  a  field  of  type  QUEL 
in  relation  Rl,  and  the  query  stored  in  Y  retrieves  field  Z  of  another  relation,  R2, 
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then  the  expression  X.Y.Z  is  a  field  in  a  collection  of  this  view.  In  other  words, 
the  expression  will  return  the  values  of  the  Z  field  of  tuples  (in  R2)  that  are 
related  to  X  through  Y.  For  example,  let  the  relation  Manager  have  a  field  called 
Officelnfo  of  type  QUEL  which  contains  a  query  that  retrieves  the  telephone 
number  of  the  relation  Location.  The  expression  Manager.OfficeInfo.Tel#  returns 
the  telephone  number  for  each  manager  in  a  tabular  format.  Clearly,  the  imple- 
mentation of  QUEL  as  a  data  type  provides  a  way  to  relate  data  in  two  relations 
without  modifying  the  database  schema. 

Instead  of  using  the  dot  notation,  ARIEL  [MAC85]  takes  advantage  of  the 
"OF"  notation.   The  example  query  described  for  GEM  can  be  restated  as 


Range  of  Lab  is  Lab 

Retrieve  Name  OF  Manager  OF  Lab 

Serial#  OF  Equipment  OF  Lab 

Address  OF  Location  OF  Lab 
Where  City  OF  Headquarters  OF  Department  OF  Manager 

OF  Lab  =  "New  York" 


using  the  "OF"  notation  which  is  linguistically  more  natural  than  using  the  dot 
notation.  However,  the  result  of  this  query  is  also  represented  by  a  flat  table 
(relation). 

DAPLEX  [SHI81]  is  a  functional  data  language.  The  data  retrieval  com- 
ponent of  DAPLEX  is  similar  to  the  languages  described  above,  although  it  is 
interpreted  differently.  In  the  functional  paradigm,  the  class  having  a  link  (i.e., 
attribute)  emanating  to  another  class  is  considered  as  a  function.  The  function 
has,  by  default,  the  name  of  the  class  to  which  the  link  points.    For  example, 
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Location(Lab)  and  Department(Headquarters)  represent  the  facts  that  Lab  has 
Location  and  Headquarters  has  Department  as  attribute,  respectively.  When  the 
function  Location(Lab)  is  applied  to  an  object  of  the  class  Lab,  it  returns  a  value 
which  is  an  object  in  the  domain  class  over  which  the  attribute  is  defined.  If  the 
navigation  is  from  one  class  to  another  through  a  sequence  of  classes,  a  nested 
function  is  used.  For  instance,  the  expression  Name(Manager(Lab))  specifies  the 
name  of  the  manager  of  a  laboratory  to  which  the  manager  is  responsible.  For  a 
particular  object  of  Lab,  the  manager  of  the  laboratory  is  produced  first;  then,  the 
function  NameQ  is  applied  to  the  returned  manager  and  returns  the  name  of  the 
manager.   The  example  query  can  be  expressed  in  DAPLEX  as  follows. 


FOR  EACH  Lab 

SUCH  THAT  City  (Headquarters  (Department  (Manager  (Lab)))) 
=  "New  York" 


PRINT  Name  (Manager  (Lab)), 
Serial#  (Equipment  (Lab)), 
Address  (Location  (Lab)) 


Even  though  DAPLEX  is  based  on  the  functional  paradigm,  it  returns  data  in  the 
form  of  a  relation  just  like  in  GEM  and  in  ARIEL. 

Banerjee  et  al.  [BAN88]  introduce  a  query  language  based  on  message  pass- 
ing. In  the  message  passing  paradigm,  the  name  of  a  link  emanating  from  a  class 
is  interpreted  as  the  name  of  a  message  which  is  stored  within  that  class.  One  can 
assume  there  is  actually  a  message  created  by  the  system  and  having,  by  default, 
the  same  name  as  its  corresponding  attribute.  When  such  a  message  is  sent  to  an 
instance  of  the  class,  it  returns  the  value  of  the  attribute.    For  example,  the  fol- 
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lowing  is  an  expression  for  selecting  a  laboratory  that  has  a  manager  who  belongs 
to  a  subordinate  department  of  its  New  York  headquarters. 


(Lab  SELECT  :S  (:S  Manager  Department 

Headquarters  City  =  "New  York")) 


SELECT  in  this  expression  is  a  message  sent  to  the  class  Lab.  The  first 
argument  of  SELECT  is  :S,  an  iteration  variable.  The  SELECT  message  iterates 
over  the  instances  of  the  class  Lab  with  :S  bound  to  one  instance  at  a  time.  The 
block  of  code  within  the  parentheses  is  the  second  argument  of  SELECT,  and  is 
executed  for  each  value  bound  to  :S.  In  this  particular  block,  the  message 
Manager  is  sent  to  the  instance  bound  to  :S  in  order  to  return  the  related  Manager 
instance.  Similarly,  Department  and  Headquarters  are  messages.  To  elaborate, 
Department  is  sent  to  the  returned  Manager  instance,  Manager  is  sent  to  the 
returned  Department  instance,  and  Headquarters  is  sent  to  the  returned  Depart- 
ment instance.  The  sign  "="  is  also  a  message  which  has  the  argument  "New 
York".  When  this  message  is  sent  to  the  resulting  headquarter  instance,  it  returns 
a  logical  object  TRUE  or  FALSE.  An  instance  of  Lab  is  qualified  for  the  above 
expression,  if  and  only  if  the  returned  logical  object  is  TRUE.  The  logical  AND 
or  OR  message  can  be  sent  to  this  object  with  an  argument  that  specifies  some 
other  condition  on  the  instance  of  Lab.  In  principle,  though  not  described  in  Ban- 
erjee  et  al.  [BAN88],  similar  message-based  expressions  can  be  used  to  retrieve 
attribute  values  of  the  resulting  Lab  instance.  The  result  of  a  query  which 
involves  such  conditions  is  the  set  of  the  instances  of  Lab  along  with  its  attribute 
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values  and  is  represented  in  a  tabular  form. 

As  shown  in  the  samples  of  these  query  languages,  their  query  formulations, 
though  interpreted  differently,  are  very  similar  to  each  other.  This  is  evident  in 
the  fact  that  the  formulating  of  queries  is  accomplished  by  navigating  the  graphi- 
cally represented  database  schema  from  class  to  class  through  their  respective 
links.  In  each  of  these  languages,  however,  a  query  operates  on  a  database  that  is 
structurally  represented  using  an  O-O  data  model  and  returns  a  result  whose 
structure  is  represented  in  a  tabular  form.  Consequently,  the  result  of  a  query 
cannot  be  further  queried  by  other  queries  written  in  the  same  language.  There- 
fore, these  languages  are  not  closed. 

Another  drawback  of  these  languages  is  seen  in  their  navigation  mechanisms 
which  can  only  formulate  queries  against  classes  (or  relations)  that  are  interre- 
lated in  simpler  patterns  like  the  linear  and  forest  structures  shown  in  Figure  2.2a. 
However,  in  O-O  databases,  the  graphical  patterns  in  which  objects  are  inter- 
related with  each  other  are  basically  networks  which  are  not  restricted  to  plane 
graphs  (a  graph  is  a  plane  graph  if  it  can  be  drawn  on  a  plane  without  any  inter- 
section of  two  edges).  They  can  be  as  complicated  as  surface  graphs  (a  graph  is  a 
surface  graph  if  it  can  be  drawn  on  a  surface  without  any  intersection  of  two 
edges).  Phrasing  queries  against  classes  that  are  interrelated  in  more  complicated 
patterns  depicted  in  Figure  2.2b  is  beyond  the  capabilities  of  these  languages. 

A  third  drawback  of  these  languages  which  renders  their  navigation  mechan- 
isms insufficient  is  that  only  one  type  of  the  relationship  (an  object  is.  related  to 
another  object)  between  objects  of  two  classes  can  be  expressed.    In  fact,  when 
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two  classes  are  directly  linked  at  the  schema  level,  objects  in  these  two  classes 
may  have  another  type  of  relationship  -  an  object  is.  not  related  to  another  object. 
This  type  of  relationship  represents  the  complement  aspect  of  the  semantics 
specified  for  the  two  associated  classes,  such  as  not-a-part-of, 
not-a-function-of,  or  ia-not-a  which  is  often  needed  in  querying  the  databases. 
For  example,  "For  each  laboratory,  list  the  equipment  that  is  not  available"  is  a 
reasonable  query. 

The  proposed  query  languages  [DAD86,  MAN86,  BAN87,  ROW87,  CAR88, 
COL89]  use  nested  relations  as  their  logical  views  of  databases.  A  nested  relation 
is  a  generalized  relation,  i.e.,  a  recursively  defined  relation:  the  attributes  of  a  rela- 
tion can  be  either  atomic  values  or  another  relation  in  which  the  attributes  can  be 
a  third  relation,  and  so  forth.  Figure  2.3  shows  an  example  of  a  nested  relation. 
Nested  relations  are  particularly  suitable  for  representing  data  in  forest  structures. 
The  above  languages  are  considered  to  be  closed,  since  operators  in  these 
languages  operate  on  nested  relations  and  produce  nested  relations.  However, 
they  also  have  the  drawbacks  mentioned  above  and  it  is  our  view  that  nested  rela- 
tion is  not  a  proper  logical  representation  for  an  O-O  database  which  is  networks 
of  objects,  object  classes,  and  their  associations.  Using  nested  relations  to 
represent  data  in  network  structures  introduces  one  level  of  indirection.  Mapping 
from  a  network  representation  to  nested  relations  is  an  extra  process.  Further- 
more, in  order  to  use  a  nested  relation  to  represent  complex  structures,  a  large 
amount  of  data  has  to  be  replicated  in  the  representation.  Figure  2.4  shows  an 
example  of  using  a  nested  relation  to  represent  a  graph  having  loops.    Note  that 
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vertex  F  has  to  be  replicated  three  times. 

2^ ENCORE  O-O  Data  MoHel  and  Tt-s  Underlying  Query  Algphra 

In  spite  of  the  popularity  of  the  0-0  paradigm  and  its  application  in  the  field 
of  database  management,  the  existing  O-O  database  management  systems  still 
lack  a  solid  mathematical  foundation  for  the  manipulation  of  an  O-O  database 
and  the  optimization  of  queries.  Recently,  a  query  algebra  [SHA90]  was  proposed 
for  the  ENCORE  O-O  data  model  [ELM89].  This  section  surveys  the  query  alge- 
bra as  well  as  the  ENCORE  model.  It  also  serves  as  a  comparison  to  the  associa- 
tion algebra  proposed  in  this  dissertation. 


2.3.1    The  ENCORE  MoHpI 


ENCORE  O-O  data  model  [ELM89]  supports  abstract  data  type,  type  inheri- 
tance, typed  collection  of  typed  objects,  objects  with  identity,  and  object  encapsu- 
lation. It  models  an  application  as  networks  of  objects,  object  types,  and  their 
associations.  The  definition  of  an  abstract  data  type  in  this  model  includes  the 
Name  of  the  type,  a  set  of  Properties  defined  for  instances  of  the  type,  a  set  of 
Operations  which  can  be  applied  to  the  instance  of  the  type.  Properties  reflect  the 
state  of  an  object  while  operations  may  perform  arbitrary  actions.  Properties  are 
typed  objects  that  may  be  implemented  as  stored  values,  procedures,  or  functions. 
The  implementation  of  a  property  is  invisible  to  the  user  and  is  assumed  to  return 
an  object  of  the  correct  type  and  to  have  no  side-effects. 
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In  addition  to  user-defined  abstract  data  types  and  a  collection  of  atomic 
types  such  as  Int,  String,  Boolean,  etc.  (i.e.,  primitive-classes),  ENCORE  provides 
two  parameterized  types  and  a  global  Object  type  which  is  the  supertype  of  all 
other  types.  The  parameterized  type  Set[T\  defines  T  as  the  type,  or  supertype,  of 
objects  in  a  collection  having  type  Set,  and  T  is  called  the  member  type  of  the  set. 
The  parameterized  tuple  type  associates  types  (T,.)  with  attribute  names  (A,.)  and 
defines  properties  Get-attribute_value  and  operations  Set_attribute_value  for  each 
attribute.  The  T,.'s  can  be  any  database  types,  thus,  allow  nesting  of  tuple  types. 
The  value  of  a  tuple  is  represented  as  <AX:  ov  A2:  o2,  .  .  .  ,  An:  on>  where  the 
A's  are  attributes  of  the  tuple  and  the  o's  are  objects  of  the  corresponding  types. 

The  global  supertype  Object  defines  a  family  of  operations  for  equality  called 
i-equality  where  t  indicates  how  "deeply"  a  comparison  of  two  objects  must  search 
before  finding  equality.  Two  objects  are  identical  when  they  are  the  same  object, 
i.e.,  they  have  the  same  identity.  Identical  objects  are  0-equal  (=0  or  just  =)  and, 
for  t>0,  two  objects  are  i-equal  (=,.)  if 

(1)  they  are  both  collections  of  the  same  cardinality  and  there  is  a  one-to-one 
correspondence  between  the  collections  such  that  corresponding  members  are 

(2)  they  both  have  the  same  type   (not  a  collection  type)  and  the  values  of 
corresponding  properties  are  =,._,. 

Type  Object  also  defines  a  stronger  notion  of  equality  called  id-equality. 
Two  objects  are  id-equal  at  depth  i  if  they  are  i-equal  and  graphical  representa- 
tions of  the  objects  are  isomorphic. 
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2.3.2   The  Underlying  Query  Algebra  of  ENCORE 

The  query  algebra  [SHA90]  is  proposed  based  on  the  O-O  model  ENCORE. 
The  domain  of  the  query  algebra  is  defined  as  a  typed  collection  of  typed  objects. 
A  typed  collection  is  of  parameterized  type  Set[T\  and  the  objects  in  the  collection 
are  of  type  T.  If  objects  of  a  collection  are  collected  from  different  types,  T  is 
their  most  specific  common  type  in  the  type  lattice.  For  example,  if  object  a  is  of 
type  S,  object  p  is  of  type  P,  and  S  is  a  supertype  of  P,  the  collection  of  objects  « 
and  p  is  of  type  Set[S}.  The  query  algebra  is  closed  since  the  operators  of  the 
query  algebra  operate  on  collection(s)  of  objects  with  type  5et[T,.]  and  produce  a 
collection  with  type  Set[T,},  where  type  Tk  is  defined  by  the  query. 

Similar  to  the  languages  surveyed  in  Section  2.2,  the  query  algebra  addresses 
a  property  of  an  object  using  'dot'  notation  (e.g.,  s.p.q  where  a  is  an  object  of  type 
Tv  p  is  a  property  of  «  and  is  of  type  T2,  and  q  is  a  property  of  p  and  is  of  type 
Ts). 

Twelve  operators  are  defined  in  this  algebra.  We  give  their  brief  definitions 
followed  by  some  example  queries  to  illustrate  the  major  concepts  of  this  algebra. 

(1)  The  Select  operation  creates  a  collection  of  objects  which  satisfy  a  selection 
predicate. 

Select(S,p)  =  {  s  |  («  in  S)Ap(«)  } 

where  p  is  the  predicate. 

(2)  The  Image  operation  is  used  to  return  a  single  object  for  each  object  in  the 
queried  collection  and  has  the  form: 
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Image{S,  f  :   I)  =  {  J[a)   |tm5} 

where  S  is  a  collection  of  objects  and  /  returns  an  object  of  type  T. 

(3)  The  Project  operation  extends  Image  by  allowing  the  application  of  many 
functions  to  an  object,  thus  supporting  the  creation  and  maintenance  of 
selected  relationships  between  objects.  The  relationships  are  stored  as  tuples 
with  Tuple  type. 

Project,   <AV  fx),...(An,  /„)>  = 

{<A1:/1(a),...,A„: /„(«)>    |  s  in  S  } 

where  S  is  of  type  Set{T\,  the  A/s  are  unique  attribute  names,  and  each  /,. 
takes  a  single  input  of  type  T  and  returns  an  object  of  type  2\.  Project 
returns  one  tuple  for  each  object  in  the  collection  being  queried.  Each  newly 
created  tuple  is  a  new  object  with  unique  object  identifier. 

(4)  The  Ojoin  operator  is  an  explicit  join  operator  used  to  create  relationships 
which  is  not  defined  between  objects  of  two  collections  in  the  database.  It  is 
essentially  a  Cartesian  product  of  collections  of  objects,  followed  by  a  selec- 
tion  of  result  tuples.   For  collections  S  and  R,  the  Ojoin  is  defined  as  follows: 

Ojoin(S,  R,  Av  A2,  p)  = 

{<Al:  s,  A2:  r>    \  8  in  S  A  r  in  R  A  p{s,r)  } 

where  p  is  a  predicate  (as  in  Select)  defined  over  objects  from  S  and  R.  The 
Ojoin  operation  creates  new  tuples  in  the  database  to  store  the  generated 
relationships.   The  tuples  created  will  have  unique  object  identifiers. 

(5)  Union,  Difference,  and  Intersection  are  the  usual  set  operations  with  object 
comparisons  and  set  membership  based  on  object  identity  (=0).    The  result  of 
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these  operations  is  considered  to  be  a  collection  of  objects  of  type  T,  where  T 
is  the  most  specific  common  supertype  (in  the  type  lattice)  of  the  types  of  the 
objects  in  the  operands. 

(6)  Flatten  operation  is  used  to  restructure  sets  of  sets  and  Nest  and  UnNest 
allow  the  representation  of  tuples  as  flat  or  nested  relations. 

(7)  For  the  above  operators,  two  identical  operations  cannot  give  identical 
response,  since  each  result  collection  is  a  newly  identified  object  in  the  data- 
base and  the  objects  in  a  result  collection  may  be  either  existing  database 
objects  or  new  tuple  objects  created  during  the  operation.  Operators  DupEl- 
iminate  and  Coalesce  are  introduced  to  handle  situations  where  equal  objects 
are  created  by  a  query. 

The   example    queries   are   issued   against   the    Supplier-Parts-Job   database 

shown  in  Figure  2.5.    For  the  purpose  of  these  examples,  it  is  assume  that  Type 

Object  is  the  only  supertype  for  each  of  the  given  types. 

Example  1:  Find  all  red  parts.  Which  suppliers  can  supply  all  of  the  red  parts? 

P_red  :=  Select(Parts,Xp  p. color  =  'Pied" 
S_Pred:=  Select(Suppliers,Xs  P_red  subset_of  s.Inventory) 

The  first  selection  finds  the  red  parts  and  the  second  selection  finds  all  sup- 
pliers for  which  the  inventory  includes  that  set  of  parts.    The  subset_of  operation 
is  available  since  property  Inventory  and  result  P_red  both  have  type  Set[Part]. 
Example  2:  What  parts  are  needed  by  jobs  in  Boston? 

Bos  Jobs  :=  Select(Jobs,Xj  j.address.city  =  'Boston") 
BosJobParts  :=  Project(BosJobs,Xj  <(J,j),(Pt,j.PartsNeeded)>) 
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The  select  operation  finds  the  jobs  in  Boston  and  the  project  operation  gives 
information  about  which  parts  are  needed  for  each  job  in  Boston.  The  result  of 
the  projection  is  of  type  Set  [Tuple].  Note  that  operation  NewPart  (of  type  Job) 
cannot  be  applied  to  members  of  BosJobParts,  since  they  have  type  Tuple.  How- 
ever, it  is  appropriate  for  objects  BosJobParts.J. 
Example  3:  Find  all  local  suppliers  for  each  job. 

LocalS:=  Ojoin(jobs,Suppliers,J,S,  Xj  Xs 

j.address.city  =  s.address.city) 

This  Ojoin  operation  produces  a  set  of  tuples  of  type  <(J,Job),(S,Supplier)>, 
which  is  similar  to  a  normalized  relation.  To  get  a  set  of  suppliers  for  each  job,  a 
Nest  operation  needs  to  be  applied:  Nest(LocalS,  S). 

From  the  above  description,  we  can  see  that  the  query  algebra  supports 
many  features  of  O-O  databases  and  has  taken  significance  steps  towards  a  power- 
ful O-O  query  algebra  to  serve  as  the  mathematical  foundation  for  O-O  database. 
However,  it  still  has  the  following  limitations. 

(1)  Although  the  ENCORE  models  an  application  as  networks  of  types,  objects, 
and  their  associations,  the  domain  of  its  underlying  query  algebra  is  defined  as 
collections  of  objects  having  type  Set[T],  which  is  essentially  a  nested  relation 
representation,  since  the  member  type  T  of  the  set  type  can  be  a  parameter- 
ized Tuple  type  which  may  in  turn  contain  attributes  of  Tuple  types.  There- 
fore, the  query  algebra  cannot  represent  network-structured  relationships 
among  objects  efficiently  and  the  mapping  problem  addressed  before  still 
remains. 
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(2)  In  this  algebra,  two  identical  expressions  or  two  identical  operations  in  a  sin- 
gle expression  do  not  give  identical  response,  since  each  result  collection  is  a 
newly  identified  object  in  the  database.  To  eliminate  duplicated  copies  of  the 
same  newly  created  object,  the  algebra  introduces  DupEliminate  and 
Coalesce  operations,  which  are  not  necessary  if  it  directly  supports  the  net- 
work  view  of  0-0  databases. 

(3)  In  this  algebra,  a  collection  may  contain  objects  with  heterogeneous  struc- 
tures. For  example,  two  objects  are  both  of  Tuple  type  but  with  different 
arities  and  the  union  of  the  two  object  is  also  a  collection  of  objects  having 
Tuple  type.  However,  other  operators  in  this  algebra  are  not  defined  to 
operate  on  such  collection(s). 

(4)  Since  the  query  algebra  is  developed  for  a  specific  model  (i.e.,  Encore),  it  is 
difficult  to  apply  to  other  0-0  models. 
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Figure  2.1   A  sample  schema 
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(a)  simple  query  patterns 


plane  graphs 


surface  graphs 


(b)  complex  query  patterns 


Figure  2.2   Simple  and  complex  query  patterns 
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NAME 


ADDRESS 


INVESTMENTS 


COMPANY 


SHARES 


PURCHASE 
PRICE 


DATE 


ND 


John  Smith 


31 1  East  2nd  St. 
Bloomington,  IN 
47401 


XEROX 

64.50 

02/01/83 

100 

92.50 

08/1  0/87 

200 

IBM 

89.75 

06/20/83 

500 

96.50 

1  1/10/84 

100 

Jill   Brody 


41  North  Main  St. 
Obertin,  Oh 
44074 


EXXON 

35.0 

01/30/81 

100 

64.50 

01/30/82 

100 

59.50 

02/10/83 

200 

FORD 

35.50 

02/10/83 

200 

SEARS 

35.75 

12/25/87 

100 

Figure  2.3  An  example  of  a  nested  relation 
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A(a1) 


Pattern 
Number 

A 

B 

C 

D 

E 

F 

F 

F 

G 

H 

1 

a 

1 

b2 

c4 

d3 

e2 

f5 

f5 

f5 

gi 

h6 

Figure  2.4   Using  a  nested  relation  to  represent  a  complex  structure 
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Type    Supplier 

properties:  operations: 

Ident:     string  RecvOrder: 

Address:    Addr  Supplier,   Set[Part] 

Inventory:     Set[Part] 


■>  Supplier 


Type  Job 

properties: 
Num:    string 
Address:    Addr 
PartsNeeded:     Set[Part] 
Preferred_Suppliers: 

Ordered_list[Supplier] 


operations: 

NewPart:    Job,  Part  -->  Job 


Type    Part 

properties: 
Num:    string 
Address:    Addr 
Color:     string 
Components: 

Set[Tuple[<(P,Part,(Qty,lnt)>]] 
Plan:    drawing 
BillofMaterial:       list[Part] 


operations: 

Order:     Part  -->  Part 

Same_Part:  Part,  Part  -->  Boolean 


Type   Addr 

properties: 

Street:      string 
City:     string 
State:     string 


Figure  2.5  A  Supplier-Parts-Job  database 


CHAPTER  3 

OVERVIEW  OF  O-O  DATABASES 

AND  ASSOCIATION-BASED  QUERY  FORMULATION 


This  chapter  informally  introduces  the  graphical  view  of  O-O  databases  and 
illustrates  the  association-based  query  formulation  mechanism.  The  graphical 
view  captures  the  most  important  characteristics  of  O-O  databases  in  which 
object  classes  and  their  objects  are  associated  with  each  other.  Based  on  this 
view,  query  formulation  and  processing  can  be  made  by  specifying  and  manipulat- 
ing association  patterns  in  which  objects  are  inter-related  with  each  other,  unlike 
the  traditional  attribute-based  query  formulation  and  processing  which  match 
values  in  different  relations.  Since  the  graphical  view  is  suitable  for  many  0-0 
data  models,  the  association  algebra  developed  based  on  this  view  can  be  used  as  a 
general  algebra  for  supporting  these  0-0  databases.  The  graphical  view  of  0-0 
databases  is  formalized  in  the  next  chapter. 


3_J Overview  of  O-O  Databases 


0-0  semantic  data  models  provide  a  conceptual  basis  for  defining  0-0  data- 
bases. Although  each  model  has  some  unique  constructs  that  distinguish  one 
model  from  the  others,  there  are  several  common  structural  and  behavioral  pro- 
perties based  on  which  an  algebra  can  be  developed  and  used  to  support  these 
models: 
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First,  objects  are  physical  entities,  abstract  concepts,  events,  processes,  func- 
tions or  anything  that  an  application  cares  to  capture  and  represent. 

Second,  objects  having  the  same  structural  and  behavioral  properties  are 
grouped  together  to  form  an  object  class.  Object  classes  can  be  categorized  into 
two  general  categories:  (l)  the  nonprimitive-class  which  represents  a  set  of  objects 
of  interest  in  an  application  world,  each  of  which  is  assigned  a  system-wide  unique 
object  identifier  (OID)  and  its  data  are  explicitly  entered  in  a  database  by  the 
user;  and  (2)  the  primitive-class  which  represents  a  class  of  self-named  objects 
serving  as  a  domain  for  defining  other  object  classes,  such  as  a  class  of  symbols  or 
numerical  values.  The  behavioral  properties  of  an  object  class  are  defined  in 
terms  of  system-defined  or  user-defined  operations  (e.g.,  retrieve,  display,  delete, 
insert,  rotate  a  design  object,  hire  an  employee,  etc.),  which  can  meaningfully 
operate  on  its  objects  using  their  corresponding  programs  (or  methods).  The 
structural  properties  of  an  object  class  and,  thus,  its  objects  consist  of  two  types  of 
data  (1)  descriptive  data  (or  instance  variables)  which  define  the  states  of  the 
objects;  and  (2)  association  data  which  specify  the  relationships  between  its 
objects  and  the  objects  of  some  related  classes. 

Third,  different  0-0  models  recognize  different  types  of  associations.  Two  of 
the  most  commonly  recognized  associations  are  aggregation  and  generalization. 
Aggregation  models  the  a-part—of,  a- function- of,  or  a— composition— of  relation- 
ship. For  instance,  a  complex  object  can  be  modeled  by  an  aggregation  hierarchy 
(abstract  data  type)  in  which  a  complex  object  is  defined  in  terms  of  its  associa- 
tions with  objects  in  other  defined  classes.    Generalization  models  the  ia-a  or  the 
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super  class- sub  class  relationship  in  which  an  object  in  a  subclass  inherits  both  the 
structural  and  the  behavioral  properties  of  its  superclass(es). 

Thus,  from  the  algebra  point  of  view,  an  O-O  database  can  be  viewed  as  a 
collection  of  objects,  grouped  together  in  classes  and  interrelated  through  associa- 
tions.   It  can  be  represented  by  graphs  at  both  the  intensional  and  the  extensional 
levels.    At  the  intensional  (schema)  level,  a  database  is  defined  by  a  collection  of 
inter-related  object  classes  and  is  represented  by  a  Schema  Graph  (SG).    For 
example,  the  SG  for  a  university    database  is  illustrated  in  Figure  3.1,  in  which 
each  rectangle  denotes  a  nonprimitive-class  such  as  a  class  of  person  objects  or  a 
class  of  department  objects,  and  each  circle  denotes  a  primitive-class  such  as  a 
class  of  names  or  ages.    The  associations  among  classes  are  represented  by  the 
edges  in  SG.    For  example,  there  is  an  association  between  the  class  Course  and 
the  class  Department  (an  Aggregation  association),  and  an  association  between  the 
class  Person   and  the   class   Student   (a   Generalization  association).     Since   the 
semantic  distinctions  of  these  and  other  association  types  recognized  by  different 
semantic  models  can  be  either  hard-coded  in  a  DBMS  or  declaratively  specified  by 
some  rules  and  used  by  a  rule  processor  to  govern  the  manipulation  of  the  associ- 
ated classes,  the  underlying  algebra  does  not  have  to  incorporate  the  semantics  of 
these  association  types.    All  it  has  to  be  concerned  with  is  whether  or  not  an 
object  class  and  its  objects  are   associated  with  some  other  classes  and  their 
objects,  i.e.,  the  edges  (or  associations)  are  type-less  in  SG.    For  example,  the 
semantics  of  inheritance  can  be  incorporated  in  a  query  language  translator  which 
translates  a  high-level  language  statement  into  its  underlying  algebraic  representa- 
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tion.  The  algebra  does  not  have  to  deal  directly  with  the  semantics  of  inheritance. 
This  is  particularly  important  if  the  algebra  is  to  be  used  as  a  general  algebra  for 
supporting  various  0-0  data  models  in  which  the  semantics  of  an  association  type 
may  have  slightly  different  meanings. 

At  the  extensional  (instance)  level,  a  database  can  be  viewed  as  a  collection 
of  objects,  grouped  together  in  classes  and  inter-related  through  some  type-less 
associations;  and  as  such  it  can  be  represented  by  an  Object  Graph  {OG).  For 
example,  the  OG  corresponding  to  a  portion  of  the  university  schema  graph  is 
shown  in  Figure  3.2.  In  this  example,  the  Teacher  object  t4  is  associated  with  two 
Section  objects;  thereby  representing  the  fact  that  he/she  is  teaching  two  sections, 
sc3  and  sc4.  The  Student  object  si  is  associated  with  Undergrad  object  ul  which, 
in  turn,  is  associated  with  Department  object  dl;  thereby  representing  that  si  is 
an  undergraduate  student  who  minors  in  the  department  dl.  Finally,  the  Section 
object  sc2  is  not  associated  with  any  object  of  the  Student  class,  which  represents 
the  fact  that  it  is  not  taken  by  any  student.  Object  associations  expressed  by 
different  graph  patterns  represent  the  semantic  relationships  among  these  objects 
in  an  application  world. 

3J2 Pattern-based  Query  Formulation 

Based  on  this  view  of  an  0-0  database,  users  can  query  the  database  by 
specifying  patterns  of  object  associations  as  search  conditions.  Once  these 
objected  are  selected,  they  can  be  further  processed  by  either  system-defined 
operations    (Retrieval,    Display,    Update,    Insert,    Delete,    etc.)    or    user-defined 
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operations  (RotatePart,  PurchasePart,  HireFaculty,  etc.).  For  example,  the  fol- 
lowing queries  can  be  issued  against  the  university  database  as  illustrated  in  Fig- 
ures 3.1  and  3.2  (the  algebraic  expressions  for  these  queries  will  be  given  in  Section 
4.4). 

Query  1:  For  all  sections,  get  the  majors  of  students  who  are  taking  these 
sections. 

To  satisfy  this  query,  we  can  specify  a  linear  pattern  containing  the  classes 
Section,  Student,  and  Department  as  shown  in  Figure  3.3a.  In  this  pattern,  a  cir- 
cle represents  a  class  and  an  edge  represents  that  the  objects  of  the  two  adjacent 
circles  (classes)  must  be  associated  with  each  other.  This  pattern  is  called  an 
intensional  pattern  which  represents  that  sections  taken  by  students  who  major  in 
some  departments  are  to  be  identified.  The  answer  to  this  query  can  be  found  in 
Figure  3.2  by  checking  if  the  objects  of  these  three  classes  satisfy  such  pattern. 
There  are  five  object  patterns  (called  extensional  patterns)  which  satisfy  the  inten- 
sional pattern  as  shown  in  Figure  3.3b.  The  Section  object  sc2  and  the  Student 
object  s3  do  not  appear  in  these  extensional  patterns,  since  sc2  is  not  taken  by  any 
student  and  s3  does  not  have  a  major  yet.  These  patterns  can  also  be  identified  in 
two  sequential  steps.  First,  get  all  the  patterns  in  which  the  Section  objects  are 
associated  with  the  Student  objects.  Then,  if  a  pattern  generated  in  the  first  step 
(i.e.,  a  Section-Student  pair)  is  further  associated  with  an  object  of  Department,  a 
new  pattern  consisting  of  three  objects  is  constructed  and  retained  in  the  result; 
otherwise,  the  pair  is  dropped. 
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Once  these  objects  (as  well  as  their  associations)  have  been  identified, 
different  system-defined  or  user-defined  operations  defined  on  their  corresponding 
classes  can  be  applied  to  these  selected  objects.  For  example,  Inform(Department) 
can  be  an  operation  defined  on  the  class  Department.  It  sends  each  of  the  selected 
departments  a  letter  concerning  the  majors  of  the  students. 

Suppose  there  is  a  rule  in  the  university  that  a  student  cannot  major  and 
minor  in  the  same  department.  To  check  whether  there  is  such  a  case  in  the 
database,  the  following  query  can  be  issued. 

Query  2:    List  students  who  major  and  minor  in  the  same  department. 

The  intensional  pattern  for  this  query  is  shown  in  Figure  3.3c.  It  can  be 
formed  by  starting  from  the  class  Student  and  navigating  the  schema  in  two 
traversal  paths  (refer  to  Figure  3.1).  One  path  is  from  Student  to  Department, 
which  means  that  a  student  majors  in  a  certain  department;  and  the  other  path  is 
from  Student  to  Department  through  Undergrad,  which  means  that  a  student  is 
an  undergraduate  and  minors  in  a  certain  department  (we  can  see  from  the  SG 
that  only  undergraduates  may  have  minors).  According  to  the  query,  a  single  stu- 
dent should  associate  with  objects  in  both  Undergrad  and  Department  and  these 
two  paths  should  merge  at  Department,  thereby  forming  a  loop.  This  implies  two 
logical  AND  conditions,  one  at  the  Student  class  and  the  other  at  the  Department 
class.  We  use  double  arcs  to  denote  such  conditions  as  shown  in  Figure  3.3c. 
From  Figure  3.2,  we  can  see  that  the  student  si  has  his  major  and  minor  in  the 
department  dl.   This  extensional  pattern  is  depicted  in  Figure  3.3d. 
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Query  3:    For  those  students  taking  section  300  and  having  majors  and/or 
minors,  get  their  majors  and/or  minors. 

There  are  several  ways  to  form  an  intensional  pattern  for  the  query.  We 
may  start  from  Section^  and  traverse  to  Student  through  Section  and,  then,  navi- 
gate the  schema  in  two  paths  as  we  did  for  query  2.  According  to  the  query,  a 
student  who  either  has  a  major  or  a  minor  should  be  included  in  the  result  (in  this 
database,  it  is  assumed  that  graduate  students  do  not  have  minors).  This  means 
that  either  path  of  the  navigation  will  construct  a  pattern  that  would  satisfy  the 
query.  Thus,  a  logical  OR  condition  exists  at  Student.  We  use  a  single  arc  to 
indicate  the  OR  condition  as  shown  in  Figure  3.4a.  Like  Query  2,  these  two 
branches  merge  at  Department.  However,  this  query  does  not  require  that  they 
merge  at  the  same  Department  object.  This  is  specified  by  the  second  OR  condi- 
tion at  Department  in  Figure  3.4a. 

The  extensional  patterns  that  satisfy  this  query  have  heterogeneous  struc- 
tures: two  types  of  linear  patterns  as  shown  in  Figure  3.4b.  The  first  type  includes 
patterns  that  represent  the  minors  of  the  undergraduates;  and  the  second  type 
includes  patterns  that  represent  the  majors  of  the  student  who  are  either  under- 
graduates or  graduates.  In  both  types  of  patterns,  a  student  is  associated  with  sec- 
tion 300  which  is  assumed  to  be  the  Section^  for  sc3.  Figure  3.4c  will  be 
described  later  in  Section  4.4. 

We  have  given  some  example  queries  which  specify  how  objects  are  associ- 
ated with  one  another.  In  the  graphical  representation  of  an  O-O  database,  when 
there  is  no  edge  between  two  objects  even  though  there  is  one  between  their 
classes,  it  implies  that  two  objects  are   not  associated  with  each  other.    This 
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represents  the  complement  aspect  of  the  semantics  between  two  associated  classes. 
It  is  necessary  to  allow  a  user  to  retrieve  this  type  of  object  non-association  from  a 
database.  The  following  query  is  such  an  example.  It  can  also  be  specified  by  a 
pattern. 

Query  4:    For  each  teacher,  list  the  sections  which  he/she  does  not  teach. 

We  use  a  dashed  line  to  represent  the  fact  that  two  objects  are  not  associated 
with  each  other.  Therefore,  the  intensional  pattern  for  this  query  can  be  drawn  as 
in  Figure  3.4d.  There  are  twelve  extensional  patterns  that  match  the  intensional 
pattern.  Figure  3.4e  shows  a  portion  of  them.  Non-association  relationships 
among  objects  are  not  explicitly  stored  in  a  database.  However,  they  can  be 
derived  during  the  processing  of  this  type  of  queries. 

Using  the  above  examples,  we  hope  that  we  have  convinced  the  reader  that 
the  pattern-based  query  formulation  is  suitable  for  query  specification  based  on  a 
graphical  view  of  an  0-0  database. 

3J3 Conclusion 

The  (type-less)  graphical  representation  of  O-O  databases  is  applicable  to 
most  0-0  data  models,  since  it  captures  the  essential  characteristics  of  0-0  data 
models  in  which  object  classes  as  well  as  their  objects  are  inter-related  with  each 
other  in  different  association  patterns.  Querying  such  databases  can  be  made  by 
specifying  patterns  in  which  objects  of  interest  are  associated  with  each  other.  It 
should  be  clear  that  this  formulation  is  quite  different  from  the  attribute-based 
query  formulation  in  the  existing  relational  query  languages  which  is  based  on 
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matching  the  attributes  (or  the  key  or  composite  key)  of  one  relation  with  the 
attributes  (foreign  keys)  in  other  relations.  A  query  that  requires  the  specification 
of  a  complex  pattern  of  object  associations  can  be  specified  in  a  rather  straightfor- 
ward manner  in  an  association-based  language,  whereas  in  an  attribute-based 
language,  complex  nestings  of  query  blocks  or  multiple  queries  would  be  required 
[ALA89a]. 

It  is  our  view  that  an  algebra  developed  for  processing  data  based  on  the 
graphical  view  of  O-O  databases  and  the  pattern-based  query  formulation  should 
satisfy  the  following  requirements.  First,  it  should  allow  direct  manipulation  of 
complex  patterns  of  object  associations.  Second,  the  closure  property  should  be 
maintained.  Third,  both  association  and  non- association  relationships  among 
objects  should  be  expressible  as  search  conditions.  Fourth,  it  should  be  complete 
in  the  sense  that  it  can  be  used  to  describe  all  possible  patterns  in  a  database. 
Lastly,  it  must  be  able  to  represent  and  process  patterns  with  both  homogeneous 
and  heterogeneous  structures. 
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Figure  3.1    Schema  graph  of  a  university  database 
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Figure  3.3  Pattern  specifications  for  Query  1  and  Query  2 
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Figure  3.4   Pattern  specifications  for  Query  3  and  Query  4 


CHAPTER  4 
ASSOCIATION  ALGEBRA 


The  association  algebra  (A-algebra)  is  defined  based  on  a  uniform  representa- 
tion of  an  O-O  database  in  terms  of  objects,  object  classes,  and  type-less  associa- 
tions, as  described  in  Chapter  3.  The  algebra  contains  a  number  of  operators 
which  operate  on  graph  structures  of  object  associations  to  produce  graph  struc- 
tures. The  closure  property  of  the  algebra  ensures  that  the  result  of  a  query  can 
be  further  manipulated  by  other  queries. 

4J Definitions 

First,  we  formally  define  an  O-O  database  at  both  schema  and  object  levels. 


Schema  Graph  (the  intensional  database): 


The  schema  graph  of  an  0-0  database  is  defined  as  SG(C,A),  where  C={Cf} 
is  a  set  of  vertices  representing  object  classes;  A  is  a  set  of  edges,  each  of 
which,  A{j{k),  represents  association  between  classes  C,.  and  Cj,  where  k  is  a 
number  for  distinguishing  the  edges  from  one  another  when  there  is  more 
than  one  edge  between  two  vertices. 


Object  Graph  (the  extensional  database): 


The  object  graph  of  an  0-0  database  is  defined  as  OG(0,E),  where  0={0{J} 
is  a  set  of  vertices  representing  object  instances  (jth  object  in  class  Ct);  and 
E={Oij===Omn}  is  a  set  of  edges  representing  the  associations  among  object 
instances.  When  one  object  instance  is  connected  with  another  in  the  object 
graph,  a  regular-edge  (solid  line)  is  drawn  between  the  corresponding  ver- 
tices as  OtJ. — O  which  specifies  that  j'th  object  instance  in  class  Ci  is 
related  to  nth  object  instance  in  class  Cm  through  the  fcth  association  of 
classes  C{  and  Cm.  If  two  object  instances  0{J  and  Omn  are  not  connected 
in  the  object  graph  but  their  classes  C,-  and  Cm  in  the  corresponding  SG  are 
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directly  connected,  a  complement-edge  (dotted  line)  is  drawn  between  them 
and  is  denoted  by  O.  „..Om„. 

*  »,j  m,n 

In  this  0-0  models,  an  object  may  participate  in  several  classes  (e.g.,  in  a 
generalization  hierarchy).  Its  representation  in  a  class  is  called  an  object  instance. 
Since  in  most  cases  in  this  dissertation,  "object"  and  "object  instance"  can  be  used 
interchangeably  without  any  ambiguity,  we  shall  use  "object"  unless  a  distinction 
is  required  between  the  two. 

The  reason  for  explicitly  introducing  complement-edges  into  the  OG  is  to 
allow  the  A-algebra  to  manipulate  both  association  and  non-association  between 
objects  of  two  adjacent  classes.  In  an  actual  O-O  database,  it  is  not  necessary  to 
explicitly  store  the  complement-edges.  Figure  4.1  illustrates  the  regular-edges  and 
complement-edges  among  the  objects  of  three  object  classes.  For  example,  we  see 
that  section  scl  is  taken  by  students  s2  and  s3  (regular-edges)  and  not  taken  by 
students  si  and  s4  (complement-edges). 

The  relationship  between  an  OG  and  its  corresponding  SG  is  formally 
described  by  the  following  proposition. 

Proposition  1:       An  OG{0,E)  is  a  morphism  of  its  corresponding  SG(C,A). 
The  mapping  function  Fm  is  defined  as 

Fn,v     C,-  =>  {0{J},  and 

*W    Aim(k)  =>  {ojLo^J. 

The  mapping  between  SG  and  OG  is  one-to-many,  since  a  database  is 
dynamically  changing  and  may  have  different  instantiations  at  different  times  for 
the  same  schema  graph. 
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To  define  "association  pattern",  we  first  extend  the  concept  of  connected 
graph  in  graph  theory  by  treating  complement-edges  as  edges,  i.e.,  a  connected 
graph  is  a  graph  in  which  there  exists  at  least  one  path  between  any  two  vertices 
and  each  path  may  contain  regular-edges,  complement-edges,  or  a  combination  of 
the  two.  We  shall  from  now  on  use  an  upper-case  letter  to  denote  a  class  and  the 
corresponding  lower-case  letter  with  a  subscript  to  denote  an  object  instance  in 
that  class.  We  shall  assume  that  there  is  only  one  edge  between  any  two  vertices 
in  SG  unless  otherwise  specified  so  as  not  to  complicate  the  notation. 

Association  Pattern: 

A  connected  subgraph  of  an  OG  is  an  association  pattern  (or  pattern  for 
short). 

By  this  definition,  a  single  vertex  (or  object  instance)  in  OG,  which  is  a  con- 
nected subgraph,  is  also  a  pattern.  We  call  it  an  Inner-association-pattern  (or 
Inner-pattern  for  short).  It  is  algebraically  represented  by  (a,.)  for  a  vertex  of  class 
A  in  SG.  Thus,  object  instances  are  treated  as  Inner- patterns  in  the  A-algebra.  A 
regular-edge  together  with  two  vertices  (i.e.,  two  Inner-patterns)  it  connects  is 
called  an  Inter-association-pattern  (or  Inter-pattern)  which  is  represented  by  (a-6.). 
A  complement-edge  together  with  the  two  Inner-patterns  it  connects  is  called  a 
Complement-association-pattern  (or  Complement-pattern)  and  is  represented  by 
{atb}).  This  pattern  states  that  a,,  and  bj  are  not  associated  with  each  other  in  OG. 
If  a  path  consisting  of  only  regular-edges  between  vertices  a,,  and  6^.,  it  can  be 
represented  by  a  Derived-inter-association-pattern  (D-inter-pattern),  denoted  by 
(a.6^);   otherwise,   it   can   be    represented   by   a  Derived-complement-association- 
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pattern  (D-complement-pattern),  denoted  by  (a,^.).  When  a  path  is  represented 
by  a  derived  pattern,  it  simply  means  that  two  vertices  are  indirectly  associated  or 
non-associated  but  how  they  are  interrelated  (the  actual  path)  is  of  no  importance. 
A  D-inter-pattern  is  treated  as  an  Inter-pattern  and  a  D-complement-pattern  is 
treated  as  a  Complement-pattern  in  the  algebraic  operations. 

The  above  five  types  of  patterns  are  the  primitive  patterns,  the  latter  four 
being  binary  patterns.  Their  graphical  and  algebraic  representations  are  summar- 
ized in  Figure  4.2a.  All  other  connected  subgraphs  are  called  complex  patterns. 
For  example,  the  complex  pattern  shown  in  Figure  4.2bl  contains  three  primitive 
patterns:  two  Inter-patterns  (a^)  and  (6^),  and  a  Complement-pattern  (£[ej).  It 
can  be  uniquely  defined  by  its  algebraic  representation  as  a  set  of  primitive  pat- 
terns, i.e.,  (o161,61c1,61d1).  More  examples  of  complex  patterns  are  shown  in  Figure 
4.2b.  From  these  examples,  one  can  observe  that  a  complex  pattern  can  be 
decomposed  into  a  set  of  binary  patterns  which  cannot  be  further  decomposed. 
This  implies  that,  in  the  algebraic  representation  of  a  complex  pattern,  an  Inner- 
pattern  may  not  occur  as  an  element  and  a  binary  pattern  may  appear  only  once. 
A  pattern  in  this  algebraic  format  is  called  a  normalized  pattern,  otherwise  it  is 
called  an  unnormalized  pattern.  (6„61c1),  (62,6^),  and  (a^fc^a^)  are  examples 
of  unnormalized  patterns.  During  the  process  of  constructing  an  association  pat- 
tern, we  always  normalize  it  by  eliminating  the  duplicates.  The  above  three  pat- 
terns have  the  normalized  forms  of  (6,^),  (62c2),  and  (a^b^),  respectively. 

The  definitions  of  OG  and  association  pattern  imply  that  a  pattern  is  a  non- 
directional  graph,  i.e.,  (a^)  =  (6,a.),  and  that  the  sequence  of  primitive  patterns  in 
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the    algebraic    representation   of    a    complex    pattern    is    not    important,    hence 

(aibj,  bjck)  =  (ckbj>  aib,)- 

Based  on  the  above  definition  and  notion  of  association  pattern,  we  view  an 
OG  as  an  Association  Graph  (AG)  and  all  the  association  patterns  in  AG  form  the 
domain  of  the  A- algebra,  denoted  by  A. 

4J2 Relationship  Between  Two  Association  Patterns 

The  operators  of  the  A-algebra  are  defined  based  on  the  possible  relationships 
between  two  patterns  in  A,  so  that  they  can  be  used  either  to  construct  complex 
patterns  using  simpler  patterns  or  to  decompose  a  complex  pattern  into  several 
patterns  of  simpler  structures.  There  are  four  possible  relationships  between  two 
patterns  p  and  p  :  non-overlap,  overlap,  contain,  and  equal. 

(1)  Non-overlap:  Two  patterns  are  said  to  be  non-overlap,  denoted  by  p1DCp2, 
if  they  have  no  common  Inner-pattern. 

(2)  Overlap.  Two  patterns  are  said  to  be  overlapped,  denoted  by  p'np2,  if  they 
have  at  least  one  common  Inner-pattern. 

(3)  Contain:  Contain  is  a  special  case  of  (2)  when  all  the  primitive  patterns  of 
p  are  contained  in  p  .  We  say  that  p1  is  a  subpattern  of  p2  and  denote  this 
relationship  by  p!Cp2. 

(4)  Equal:  This  is  a  special  case  of  (3)  when  p  contains  all  the  primitive  pat- 
terns of  p2,  and  vice  versa.  It  is  denoted  by  p=p2. 

Before    defining    the    association    operators,    we    give    the    definition    of 

"Association-set"  —  the  operand  of  the  association  operators. 

Association-set: 

An  association-set,  denoted  by  a  Greek  letter  a  (or  0,%...),  is  a  set  of  associa- 
tion patterns  without  duplicates,    a   designates  the  »'th  pattern  in  a,  where 
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a^at  (V»'^j).   An  empty  set  is  also  an  association-set,  denoted  by  4>. 
A  special  type  of  association-set  is  called  homogeneous  association-set,  which 
is  important  to  the  A-algebra,  since  some  of  the  mathematical  properties  hold  only 
when  operands  are  homogeneous  association-sets. 
Homogeneous  Association-set: 

An  association-set  is  homogeneous,  if 

(1)  all  patterns  are  formed  by  the  Inner-patterns  (or  object  instances)  of 
the  same  set  of  object  classes;  and 

(2)  all  patterns  have  the  same  number  of  Inner-patterns  from  each  class  in 
the  set;  and 

(3)  corresponding  primitive  patterns  belong  to  the  same  association  and  are 
of  the  same  type;  and 

(4)  all  patterns  have  the  same  topology. 
Otherwise,  it  is  a  heterogeneous  association-set. 

Figure  4.3  depicts  three  example  association-sets:  or  is  homogeneous,  whereas 
f)  is  not  since  pattern  p  has  only  one  Inner-pattern  of  class  C  instead  of  two  like 
p  and  p.  7  is  not  homogeneous  because  7  contains  a  Complement-pattern  which 
is  different  from  7  and  7  (i.e.,  different  topologies). 

4J2 Association  Operators 

Ten  association  operators  are  formally  defined  in  this  section:  three  unary 
operators  [A-Project  (77),  A-Select  (a),  and  A-Integrate  (/)]  and  seven  binary 
operators  [Associate  (*),  A-Complement  (|),  A-Union  (+),  A-Difference  (-),  A- 
Divide  (-^),  NonAssociate  (l),  and  A-Intersect  (•)].  The  examples  used  to  explain 
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these  operators  will  make  use  of  the  domain  A  shown  in  Figure  4.4.  To  keep  the 
graph  simple,  the  Complement-patterns  are  not  shown  in  the  figure.  The  simple 
mathematical  properties  such  as  commutativity,  associativity,  idempotency,  and 
nilpotency  satisfied  by  the  operators  are  given  after  each  definition. 

4.3.1 Notations 

Notations  that  will  be  used  in  the  subsequent  sections  are  listed  below. 

A,  B,...,K  Denote  classes. 

CL{  Denotes  a  variable  for  a  class. 

[R(CL1,CL2)}  Denotes  the  association  between  classes  CLX  and  CL2. 

a-i  Denotes  the  *th  Inner-pattern  of  class  A. 

@  Denotes  an  Inner-pattern  variable. 

(<*,•&,■)  Denotes  an  Inter-pattern  between  two  classes  A  and  B. 

iaibj)  Denotes  a  Complement-pattern  between  two  classes  A  and  B. 

(a,cfc)  Denotes  a  Derived-pattern  from  class  A  to  class  C. 

ot,  0,  1,  -  Denote  association-sets. 

a  Denotes  »th  pattern  of  association-set  a. 

{W},{JK},{y},...        Denote  sets  of  classes.  Hence,  ttr«  represents  association-set  a 
which  has  Inner-pattern(s)  from  the  classes  in  {X}. 

It  should  be  noted  that  an  Inner-pattern  is  represented  by  an  object  instance 

identifier  (//£>),  which  is  a  system-assigned  object  identifier  {OID)  prefixed  by  a 

class  identification  so  that  the  object  instances  of  an  object  in  multiple  classes  can 

be   unambiguously   distinguished   and  the   fact   that   these   object   instances  are 
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instances  of  the  same  object  can  easily  be  recognized. 

4.3.2 Operators 

All  relational  algebraic  operators  operate  on  relations  of  homogeneous  (or 
union-compatible)  structures  with  the  exception  of  Cartesian-product  and  Join. 
The  Cartesian-product  and  Join  provide  the  mechanism  to  concatenate  two  rela- 
tions of  different  structures  into  a  single  relation,  so  that  it  can  be  further  manipu- 
lated by  other  operators.  In  the  A-algebra,  all  the  operators  are  defined  to  operate 
on  association  patterns  of  homogeneous  as  well  as  heterogeneous  structures. 
Therefore,  the  relational  algebra  is  a  special  case  of  the  A-algebra  in  this  respect. 

(1)  Associate  (*): 

The  Associate  operator  is  a  binary  operator  which  constructs  an  association- 
set  of  complex  patterns  by  concatenating  the  patterns  represented  by  two  operand 
association-sets.  Since  a  pattern  may  involve  many  classes  and  an  object  class 
may  have  more  than  one  association  with  another  class,  it  is  necessary  to  specify 
through  which  association  the  concatenation  of  two  patterns  is  intended.  The 
Associate  operation  on  association-sets  a  and  f)  over  the  association  R  between 
classes  A  and  B  is  defined  as  follows: 

or  *  [R{A,B)\  fi  =  {    7  |    WXa^J:  ambne[R(A,B)}  A  amea   A  bnE0    } 

The  result  of  an  Associate  operation  is  an  association-set  containing  no  dupli- 
cates. Each  of  its  pattern  is  the  concatenation  of  two  patterns  (one  from  each 
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operand  association-set).  More  specifically,  if  the  Inner-pattern  (or  object  am)  of  A 
in  a  is  associated  with  the  Inner-pattern  (or  object  6n)  of  B  in  ft  in  the  domain  of 
the  algebra  A  shown  in  Figure  4.4,  then  a  and  0  are  concatenated  via  the  primi- 
tive pattern  (ambn). 

We  do  not  restrict  A  and  B  to  be  different  classes  in  *[R(A,B)),  i.e., 
a  *[R{A,A)\P  is  a  legitimate  operation,  which  concatenates  two  patterns  (one  from 
each  operand  association-set)  if  they  have  a  common  Inner-pattern  of  class  A. 

An  example  of  the  Associate  operation  is  shown  in  Figure  4.5a  (for  conveni- 
ence a  copy  of  the  sample  database  is  shown  in  each  figure  for  illustrating  an 
operation.  For  clarity,  we  use  graphical  notation  in  the  figures.  In  the  example, 
a  is  concatenated  with  p  and  p2,  respectively,  due  to  the  existence  of  (b^)  and 
(6jC2)  in  A  as  shown  in  Figure  4.4.  a  is  dropped  simply  because  it  does  not  have  an 
Inner-pattern  of  class  B.  a  is  dropped  because  (62)  is  not  associated  with  any 
Inner-pattern  of  class  C  in  A.  ft  cannot  be  concatenated  through  (c4)  with  any 
pattern  in  or  because  no  pattern  in  a  has  an  Inner-pattern  of  B  that  is  associated 
with  (c4)  in  A.  For  the  same  reason  f?  is  dropped. 

For  the  Associate  operator,  \R(A,B)}  can  be  omitted  if  the  following  condi- 
tions hold:  (1)  both  a  and  ft  are  A-algebra  expressions,  (2)  the  Associate  operator 
operates  on  the  last  class  in  a  linear  expression  a  and  the  first  class  in  a  linear 
expression  P,  and  (3)  there  is  a  unique  association  between  these  two  classes.  For 
example,  A  *\R{A,B)\  B  can  be  written  as  A  *B,  if  class  A  is  associated  with  class 
B  through  the  attribute  [R(A,B)}  of  A.  It  should  be  pointed  out  that  A-algebra 
allows  an  attribute  to  be  defined  by  a  computed  value  (or  object).   For  instance, 


60 


B=J(A).  The  implementations  of  the  function  and  the  procedure  are  invisible  to 
the  algebra.  However,  they  should  not  have  side  effect,  i.e.,  the  computed  result 
must  be  of  the  same  type  as  JB. 

The  Associate  operator  is  commutative  and  conditionally  associative  as 
defined  below: 

a  4R(A,B)}  0  =  P  *\R(B,A)\  a  (commutativity) 

(orw  *[R(A,B)\  P{Y))  *[R{C,D)}  -y{z}  (associativity) 

=  «W  *\R{A,B)\  (p{Y)  *{R{C,D)\  7{z})  (if    C(L{X}  A  ^{Z}) 

A  *{R(A,A)\  A  =  A  (idempotency) 

The  associativity  holds  true  if  a  and  7  do  not  have  Inner-pattern  of  classes  C 
and  B,  respectively.  Otherwise,  the  associativity  does  not  hold.  For  example,  if 
or^ajfcj^jCj),  P=(blc1),  THdJ,  and  A  is  as  shown  in  Figure  4.4  (the  domain  of  the 
algebra),  then 


and 


(a  *{R(A,B)}  p)  *{R(C,D)\  7  ^fc.^c.^c^d,) 


a  *[R(A,B)}  (p  *[R(C,D)}  7)  =  0 
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(2)  A-Complement  ( |): 

The  A-Complement  operator  is  a  binary  operator  which  concatenates  the 
patterns  of  two  operand  association-sets  over  Complement-patterns.  It  is  used  to 
identify  the  objects  in  two  classes  which  are  not  associated  with  each  other  in  A. 
The  A-Complement  operator  is  defined  as  follows: 

or  |  [R(A,B)\  P  =  {    7  I    l=(<*\ft ,^X):    (^Jn)e[R(A,B)}  A  amea   A  bneft 
or    -yW  :    3{m){am£a)  A  A(n){bn£p) 
or    J=ft  :    3(n)(bneft)  A  i(m)(am&*)    } 

The  result  of  an  A-Complement  operation  is  an  association-set.  Each  of  its 
patterns    is    formed   by   concatenating    two   patterns    (one    from    each   operand 


association-set)  via  a  Complement-pattern  (am6J,  where  am  and  bn  belong  to  or 
and  ft,  respectively,  and  the  Complement-pattern  (am6„)  is  in  A.  In  the  special 
case  when  a(or  p)  is  an  empty  association-set  or  does  not  have  Inner-patterns  of 
class  A{or  B),  then  all  patterns  of  f)(or  a)  that  have  Inner- patterns  of  A{or  B)  are 
retained  in  the  resulting  association-set. 

An  example  of  the  A-Complement  operation  is  shown  in  Figure  4.5b.  It 
operates  over  the  association  between  classes  B  and  C.  a2  does  not  appear  in  the 
resultant  association-set  because  it  contains  no  Inner-patterns  of  B.  a1  cannot  be 
A-Complemented  with  ft  and  ft  because  it  is  connected  with  ft  and  ft  by  Inter- 
patterns  (bjc,)  and  (6^2)  in  A,  respectively. 

Under  the  same  conditions  as  given  in  the  Associate  operator,  [R(A,B)}  need 
not  be  specified  with  the  A-Complement  operator  unless  there  is  an  ambiguity. 
The  A-Complement  operator  is  commutative  and  associative.   For  the  similar  rea- 
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son  described  for  the  Associate  operator,  the  associativity  holds  true  conditionally. 

a  |  [R(A,B)}  0  -  ft  |  [R{B,A)\  «  (commutativity) 

(orw  |  [R(A,B)\  P(Y))  |  [R{C,D)\  7{z}  (associativity) 

=  aw  |  [*(A,B)]  (0{y}  |  [R(C,D)}  7{z})  (»/    Cg{X}  A  B<£{Z}) 

A  \{R(A,A)}  A  =  <t>  (nilpotency) 

(3)  A-Select  (<r): 

The  A-Select  is  a  unary  operator,  which  operates  on  an  association-set  a  to 
produce  a  subset  of  patterns  that  satisfy  a  specified  predicate  P.  A  pattern  in  the 
operand  association-set  is  retained  iff  the  predicates  are  evaluated  true  for  that 
pattern. 

<r(a)[P]  =  {    7  J  y  ->  a  :  F[a)=true    } 

where  a  is  defined  by  an  algebraic  expression,  and  P=  TxdxT2Q2  •  •  •  0n_,T„.  Each 
term,  Tt{i=l,2,...n),  is  a  comparison  between  two  expressions  and  6,{i=l,2,...,n-l)  is  a 
Boolean  operator  (Aorv).  ^«')=*rue  represents  that  a  pattern  is  evaluated  true  for 
that  predicate. 

The  expressions  on  the  left-  and  right-hand  sides  of  a  comparison  operation 
may  contain  constants,  functions,  and/or  operations  on  objects,  but  cannot  both 
be  constants.  The  comparison  terms  are  type  sensitive,  i.e.,  the  results  of  the  two 
expressions  in  a  term  should  be  data  of  the  same  type  for  primitive-classes  or  both 
IIDs  for  nonprimitive-classes.  =,>,<,>,<,  and  *  are  the  legitimate  comparisons 
for  numerical  types;  =  and  ^  for  character,  string,  and  IID  types;  and  =,C,D,C,D, 
and  *  for  set  types.  The  comparison  of  two  IIDs  is  performed  by  comparing  their 
OID  portions,  since  IIDs  are  the  concatenations  of  the  class  identifiers  and  OIDs. 
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A  single  valued  object  or  a  single  IID  can  be  treated  either  as  its  own  data  type  in 
numerical,  string,  or  IID  comparison,  or  as  a  set  type  containing  one  element  in  a 
set  comparison. 

As  an  example  of  A-Select,  we  assume  that  there  are  two  associated  classes: 
S  for  stack  and  Q  for  queue.  To  select  associated  stack  and  queue  object  pairs  in 
which  the  top  and  the  bottom  of  the  stack  have  some  common  object(s)  with 
those  in  the  head  and  the  tail  of  the  queue,  it  can  be  written  as 

o(S *Q)[{top(S){JboUom(S))  p|  {head(Q)\Jtail(Q))  *  +} 

For  the  top  equals  the  head  and  the  bottom  equals  the  tail,  we  have 

o{S *Q)[top(S)=head(Q)  A  bottom{S)=tail(Q)} 

(4)  A-Project  (if): 

Similar  to  the  projection  operation  in  the  relational  algebra,  an  A-Project 
operation  is  defined  to  project  subpattern(s)  of  a  pattern.  However,  in  the  rela- 
tional algebra,  the  relationship  among  the  projected  attributes  is  not  important. 
Whereas  in  A-algebra,  the  association  among  the  projected  subpatterns  must  be 
maintained  so  that  the  associations  among  the  objects  in  these  subpatterns  will  be 
retained.   The  A-Project  operator  is  defined  as  follows: 

where  a  is  an  association-set  defined  by  an  A-algebra  expression; 
£=(ev  e2,  .  .  .  ,  en)  is  a  set  of  expressions  which  specify  subpatterns  to  be  pro- 
jected; and  T=(tv  t,,  .  .  .  ,  tm)  is  a  set  of  ordered  sets  of  classes.   Each  ordered  set, 
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tt,  specifies  a  path  connecting  two  projected  subpatterns  defined  by  the  €  expres- 
sions. 

et{i=l,2,...,n)  is  a  subexpression  of  the  expression  which  defines  a.  e{  and 
Cj  (v*^i)  should  not  contain  a  common  class.  There  may  be  many  paths  that  con- 
necting two  subpatterns  in  the  original  pattern.  The  path  to  be  retained  can  be 
specified  in  tk.  If  a  specific  path  is  chosen,  a  minimal  number  of  classes  along  the 
path  which  can  uniquely  identify  the  path  should  be  specified.  The  result  of  an 
A-Project  operation  over  a  pattern  is  its  subpatterns  defined  by  €  and  some  paths 
defined  by  Tthat  connect  these  subpatterns.  If  a  path  in  the  original  pattern  con- 
sists of  all  Inter-patterns,  a  D-inter-pattern  is  retained.  Otherwise,  a  D- 
complement-pattern  is  included.  Multiple  paths  between  two  projected  subpat- 
terns can  be  declared  in  T,  if  it  is  so  desired. 

Figure  4.5c  shows  an  example  of  A-Project  from  a  pattern  a  over  A*B  and 
D.  For  or1,  the  subpatterns  (a,*,)  and  (<*,)  satisfy  A*B  and  D,  respectively.  There- 
fore, they  are  kept  in  the  result.  According  to  the  path  specification  stated  in  the 
operation,  a  Derived-pattern  (6^,)  is  added  to  the  result,  thus  7*-(a1&1,  d,  \d).  Its 
normalized  form  is  ri={albvl>ld).  ^  is  produced  for  the  same  reason.  Since  a 
does  not  have  a  subpattern  satisfying  A*B,  only  (<£,)  is  retained. 

(5)  NonAssociate  (!): 

The  NonAssociate  operator  is  a  binary  operator  used  to  identify  the  associa- 
tion patterns  in  one  operand  association-set  that  are  not  associated  (over  a 
specified  association)  with  any  pattern  in  the  other  association-set,  and  vice  versa, 
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in  the  domain  of  the  algebra  A.   The  NonAssociate  operator  is  defined  as  follows: 

a  !  [R(A,B)}  P={    7  I    7*  =  («*',  0,  ^~6J:    (^J~n)e[R(A,B)}  A  amG«*  A  bnE0 

A  V  {{amb  ,),{a  ,bn)eA){a  ^«A^/i) 


n         m 
ft 


or    7    =<*':    3(m)(amG«)  A  ^(n)(6nG^ 

V  V^eflaft  Mm)(afcGa  A  (afe6„)e[i2(A,i?)]) 
or    7fc  =  ^':    3(n)(bnel?)AA(m)(amea) 

V  V(ameor)3(fc,  k?n)(bkep  A  (ambk)e[R(A,B)})    ) 

The  result  of  a  NonAssociate  operation  is  an  association-set.  Each  of  its  pat- 
terns is  formed  by  concatenating  two  patterns  a  and  0  via  a  Complement- 
pattern  (ambn)  under  the  condition  that  a  is  not  associated  with  any  0  and  vice 
versa.  Furthermore,  in  the  special  case  where  the  patterns  of  a(or  p)  have  Inner- 
patterns  of  A(or  B)  and  cannot  be  concatenated  with  any  pattern  of  P(or  a),  these 
patterns  of  a(or  p)  will  be  retained  in  the  result  if  one  of  the  following  three  condi- 
tions holds:  (1)  fi(or  a)  is  an  empty  association-set,  (2)  all  patterns  of  P(or  a)  do 
not  have  Inner-patterns  of  B{or  A),  or  (3)  all  patterns  of  P(or  a)  that  have  Inner- 
patterns  of  B(or  A)  can  be  concatenated  with  patterns  of  a(or  p). 

An  example  of  the  NonAssociate  operation  is  shown  in  Figure  4.5d.  In  the 
example,  a  and  0  are  dropped  due  to  the  existence  of  (6jC2)  in  Figure  4.4.  a  is 
dropped  because  it  does  not  contain  an  Inner-pattern  of  class  B.  0  is  dropped 
because  it  does  not  contain  an  Inner-pattern  of  class  C.  7  is  in  the  resultant 
association-set  because  (62)  is  not  associated  with  (c4)  in  A  as  shown  in  Figure  4.4 
and  (63)  does  not  appear  in  a.   7  exists  because  (b2)  is  not  associated  with  (c3)  in  A. 

Note  that  the  NonAssociate  operator  produces  a  resultant  association-set 
which  is  a  subset  of  that  produced  by  the  A-Complement  operator,  because  a,  0, 
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and  ambn  may  form  a  new  pattern  only  when  om  of  a  does  not  associate  with  any 
object  of  B  in  P  and  6n  of  0  does  not  associate  with  any  object  of  A  in  a.  In  fact, 
the  NonAssociate  operator  can  be  expressed  in  terms  of  A-Complement  and  other 
operators  as  follows: 

A  !  [R(A,B))  B  =  (A-n\A  *[R(A,B)}  B)[A]   \[R(A,B)}  (B  -  B\A  *[R(A,B)}  B)[B]) 

Thus,  NonAssociate  is  not  a  primitive  operator  in  a  strict  sense.  However,  it  is 
very  useful  for  query  formulation  and  is  therefore  included  in  the  set  of  A-algebra 
operators. 

Under  the  same  conditions  as  given  in  the  Associate  operator,  [R(A,B)}  need 
not  be  specified  unless  there  is  an  ambiguity.  The  NonAssociate  operator  is  com- 
mutative but  not  associative. 

a  !  [R(A,B)\  p  =  p  !  [R{B,A)\  a  (commutativity) 

A  \[R(A,A)}  A  =  4>  (nilpotency) 

(6)  A-Intersect  (•): 

The  A-Intersect  operation  is  convenient  for  constructing  a  pattern  with  a 
branch  or  a  lattice  structure  (a  pattern  that  has  a  loop),  since  a  pattern  in  such 
structures  can  be  viewed  as  the  intersection  of  two  patterns.  Conceptually,  the 
A-Intersect  operator  is  equivalent  to  the  JOIN  operator  in  the  relational  algebra. 
It  operates  on  two  operand  association-sets  over  a  set  of  specified  classes.  Two 
patterns,  one  from  each  association-set,  are  combined  into  one  if  they  contain  the 
same  set  of  Inner-patterns  for  each  specified  class.  The  A-Intersect  operation  is 
defined  as  follow: 
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V(CLne{  W})V(@GCLn;a*')(@G^) 

A  V(CLne{W})V(@eCLn,^)(@Ga*')    } 

Figure  4.5e  shows  an  example  of  the  A-Intersect  operation  over  classes  B  and 
C.  The  resultant  association-set  contains  four  patterns,  which  are  the  intersection 
of  a  up,  a  uf?,  auft,  and  a  up2,  respectively,  since  they  all  have  Inner-patterns 
(6,)  and  (c2).  Other  patterns  {a,  a,  $ ,  01)  fail  to  produce  new  patterns  because 
they  either  have  no  Inner-pattern  in  both  classes  B  and  C  or  have  no  common 
Inner-pattern  of  class  C. 

The  set  of  classes  { W}  can  be  omitted  when  the  A-Intersect  operation  is  per- 
formed on  all  the  common  classes  of  its  operands,  i.e.,  { W}={X}ff  Y)  is  implied. 

Since  a  lattice  pattern  can  be  transformed  into  a  set  of  other  simple  patterns, 
an  A-Intersect  operation  for  building  a  complex  pattern  can  be  replaced  by  an 
Associate  operation  followed  by  an  A-Select  operation  (see  Section  4  for  detail). 
The  A-Intersect  operator  is  commutative,  conditionally  associative  and  idempo- 
tent. 

a  »{W)  p  =  p  *{W}  a  (commutativity) 

(«W  'TO  P{Y))  -TO  1{g)  =  <*w  .  {W,}  {fim  .{W2}  7{z})         (associativity) 

(*/  <W-to)  n  {Z}  =  <f>  a  ({wy-^wn  {*}  =  *) 

a  •  a  =  a         (if  a  is  a  homogeneous  association— set)  (idempotency) 

The  associativity  is  not  always  true  because  there  are  cases  in  which  a  pat- 
tern of  P  which  fails  to  intersect  with  any  pattern  of  7,  may  succeed  by  first  inter- 
secting with  a  pattern  of  or  in  the  operation  {•{Wl})  and  then  intersecting  with  a 
pattern  of  7  in  the  operation  (•{  W2}). 
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Now  we  define  three  set  operators,  which  are  different  from  the  correspond- 
ing set  operators  in  relational  algebra,  since  they  operate  on  heterogeneous  struc- 
tures as  well  as  homogeneous  structures. 

(7)  A-Integrate  (/): 

The  A-Integrate  is  a  unary  operator.  It  reorganizes  patterns  in  an 
association-set  according  to  the  relationships  among  patterns  with  respect  to  the 
classes  specified.   The  A-Integrate  operation  is  defined  as  follows: 

!{Wi{<*)  =  {    7  I  V  -  («.): 

v(fc,  CL„e{W}A@eCLnA@eaWeaj(@eafcA«fcea,)  } 

By  this  definition,  a  subset  of  patterns  (orj  of  a  is  combined  into  a  single  pattern  if 
every  object  instance  of  classes  in  {W}  that  appears  in  a  pattern  in  the  subset  is 
also  contained  in  all  other  patterns  in  the  subset.  If  a  pattern  of  a  cannot  be  com- 
bined with  any  other  pattern,  it  is  retained  in  the  resultant  association-set  as  it  is. 

If  no  class  is  specified,  patterns,  in  which  every  pattern  has  at  least  one 
object  instance  (of  any  class)  common  to  another,  will  be  integrated  into  one  pat- 
tern. The  reorganized  association-set  will  contain  patterns  which  are  apart  from 
each  other  (refer  to  Section  4.2). 

Figure  4.5f  shows  two  examples.  The  first  example  shows  an  A-Integrate 
operation  over  class  A.  Patterns  that  have  common  Inner-pattern  of  class  A  are 
grouped  into  one  (7  is  the  integration  of  a  ,  a  ,  and  a  ;  and  7  is  the  integration  of 
a  and  a  ).  All  other  patterns  in  or  are  retained  in  the  result  as  they  are.  The 
second  example  illustrates  an  A-Integrate  operation  on  the  same  association-set  of 
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the  first  example  but  without  specifying  a  class.  The  result  becomes  two  patterns, 
which  are  apart  and  are  exactly  the  same  as  they  appear  in  the  original  database. 
Whereas  the  same  primitive  patterns  appear  more  than  once  in  the  result  of  the 
first  example. 

(8)  A-Union(+): 

Similar  to  the  UNION  operation  of  the  relational  algebra,  A-Union  combines 
two  association-sets  into  one.  However,  these  two  association-sets  can  contain 
heterogeneous  association  structures.  It  is  important  for  A-algebra  to  be  able  to 
operate  on  heterogeneous  structures  because  some  prior  operations  may  produce 
heterogeneous  association-sets  and  may  need  to  be  further  processed  over  the 
objects  of  a  common  class  against  other  patterns  of  associations.  Unlike  the  rela- 
tional algebra  and  other  0-0  query  languages,  union-compatibility  is  not  a  restric- 
tion in  A-algebra.  For  this  reason,  A-algebra  has  more  expressive  power.  Any 
query  that  can  be  expressed  by  a  single  expression  in  other  languages  can  be 
expressed  as  a  single  A-algebra  expression  but  not  vise  versa.  The  A-Union  opera- 
tion is  defined  as  follows: 

<*  +  P  =  {    7  I    7€a  v  ieP   } 

The  A-Union  operator  is  commutative,  associative,  and  idempotent: 

a  +  P  =  P  +  a  (commutativity) 

(a  +  p)  +  7  =  a  +  (P  +  7)  (associativity) 

a  +  a  =  a  (idempotency) 
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(9)  A-Difference  (-): 

The  A-Difference  implements  the  same  concept  as  the  DIFFERENCE  opera- 
tor in  relational  algebra  but  with  two  differences.  First,  its  operands  do  not  have 
to  be  union  compatible.  Secondly,  a  pattern  in  the  minuend  is  retained  if  it  does 
not  contain  any  of  the  patterns  in  the  subtrahend. 

The  example  depicted  in  Figure  4.5g  shows  that  a  and  a  are  dropped  since 
they  both  contain  ft. 


(10)  A-Divide  (*> 

The  A-Divide  operator  implements  the  concept  that  a  group  of  patterns  with 
certain  common  features  contains  another  set  of  patterns. 

*  *{*>  0  -  (  1 1  7*  =  «:':  W?Q*. )  } 

where  at  is  a  subset  of  the  patterns  of  a,  which  have  common  Inner-patterns  for 
all  classes  of  {W)  and  they  together  contain  all  patterns  of  p.  If  {W}  is  not 
specified,  the  A-Divide  operation  retains  all  the  patterns  of  a,  if  each  of  which 
contain  at  least  one  pattern  of  ft  and  they  together  contain  all  patterns  of  p. 

Figure  4.5h  shows  an  example  of  a  being  divided  by  f)  with  respect  to  class 
B.  The  A-Divide  operation  retains  a ,  a  ,and  a3  since  they  all  contain  Inner- 
pattern  (6,)  of  B  and  together  contain  all  patterns  of  /?. 
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4.3.3  Precedence 

The  precedence  relationships  of  the  above  operator  are  as  follows.  Unary 
operators  have  higher  precedence  than  binary  operators.  The  precedence  of  the 
seven  binary  association  operators  is  given  in  the  following  order:  *,  |,  !,  •,  +,  -, 
and  +.   Parentheses  can  be  used  to  alter  the  precedence  relationships. 

4.3.4  Summary  of  operators 

(1)  Associate  (*):  Two  patterns  are  concatenated  via  an  Inter-pattern. 

(2)  A-Complement  ( |):  Two  patterns  are  concatenated  via  a  Complement-pattern. 

(3)  A-Select  (cr):  A  pattern  is  retained  if  it  satisfies  the  predicate. 

(4)  A-Project  (77):  A  subpattern  is  projected  from  the  original  pattern. 

(5)  NonAssociate  (!):  Two  patterns  are  concatenated  via  a  Complement-pattern 
only  if  each  of  them  cannot  be  concatenated  with  any  pattern  of  the  other 
operand  via  an  Inter-pattern. 

(6)  A-Intersect  (•):  Two  pattern  are  combined  into  a  single  pattern  if  their  com- 
mon classes  have  common  object(s). 

(7)  A-Integrate  (J):  Patterns  in  an  association-set  are  combined  if  objects  of  a 
specified  class  in  a  pattern  are  common  to  these  patterns. 

(8)  A-Union  (+):  Two  association-sets  are  lumped  into  a  single  set. 

(9)  A-Difference  (-):  A  pattern  in  the  minuend  is  retained  if  it  does  not  contain 
any  pattern  in  the  subtrahand. 

(10)  A-Divide  (^-):  A  subset  of  patterns  in  the  dividend  that  have  certain  common 
feature(s)  and  contain  all  the  patterns  in  the  divisor  is  retained. 
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AA Query  Examples 

We  have  formally  defined  nine  association  operators  and  given  their  simple 
mathematical  properties.  Before  exploring  other  properties,  we  give  some  exam- 
ples to  illustrate  how  these  operators  can  be  used  to  formulate  queries  for  process- 
ing an  0-0  database.  There  can  be  many  alternative  expressions  for  the  same 
query.  Choosing  the  best  one  for  execution  is  the  task  of  a  query  optimizer.  The 
mathematical  properties  of  these  operators  can  be  used  for  that  purpose. 

In  the  following  formulation  of  algebraic  expressions,  we  assume  that  the  user 
is  using  the  algebra  directly  instead  of  a  high-level  query  language.  In  the  latter 
case,  the  task  of  generating  algebraic  expressions  would  belong  to  the  translator. 

To  formulate  an  A-algebra  expression  for  a  query,  first,  we  need  to  construct 
an  intensional  pattern  for  it  by  navigating  the  schema  graph  of  the  database  as 
illustrated  in  Chapter  3.  Then,  each  edge  of  the  pattern  is  marked  an  operator  *, 
|,  or  !  on  the  intended  semantics.  For  simple  patterns,  the  formulation  is  straight- 
forward. For  patterns  with  complex  structures,  we  may  have  to  decompose  them 
into  patterns  with  simpler  structures.  The  expression  for  the  original  pattern  is 
the  A-Intersect's  of  the  expressions  for  the  decomposed  patterns. 

First,  we  formulate  expressions  for  Query  1  to  Query  4  given  in  Chapter  3. 
We  have  identified  the  intensional  patterns  for  these  queries  (see  Figure  3.3). 

Query  1:     For  all  sections,  get  the  majors  of  students  who  are  taking  these 
sections. 

It  is  trivial  to  write  an  algebraic  expression  for  Query  1,  which  is  represented 
by  a  linear  pattern.    For  this  pattern,  two  edges  are  all  marked  with  *  and  the 
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algebraic  expression  can  be  formulated  as  follows: 
I,       .     {n{Section  *  Student  *  Department)[Section,Department;Section:Department}) 

{Section}  " 

where  the  A-Integrate  operation  groups  the  resultant  patterns  by  Sections. 

Query  2:    List  students  who  major  and  minor  in  the  same  department. 

For  Query  2,  the  edges  of  the  intensional  pattern  shown  in  Figure  3.3c  are  all 
marked  with  *.  Since  this  loop  structure  can  be  viewed  as  the  A-Intersect  of  two 
linear  patterns  involving  both  Student  and  Department,  we  have 

II{Student  *  Undergrad  #  Department  •  Student  *  Department)[Student] 

where  the  A-Project  operation  gets  the  student  objects  that  satisfy  the  association 
pattern  as  required  by  the  query. 

Query  3:     For  those  students  taking  section  300  and  having  majors  and/or 
minors,  get  their  majors  and/or  minors. 

The  expression  for  the  intensional  pattern  of  Query  3  shown  is  as  follow: 

Section#  *Section*  (Student ^Department  +  Student *Undergr ad *Department-l) 

where  the  A-Union  operator  is  used  to  realize  the  OR  condition  at  the  class  Stu- 
dent. As  long  as  a  student  has  a  major  or  a  minor,  the  linear  pattern  from  Student 
to  Department  and  the  linear  pattern  from  Student  to  Undergrad  and  to  Depart- 
ment should  be  retained.  In  the  expression,  Department_l  is  an  alias  of  Depart- 
ment, which  is  used  to  distinguish  major  and  minor  departments.  Since  the  query 
ask  for  the  majors  and  minors  of  students  who  are  taking  section  300,  the  A-Select 
and  A-Project  operations  are  used.   Thus,  we  have 
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J  (IJ{  o(at)[Section#=300}){Student,  Department,  Departments, 

{Student} 

Student-Department,  Student.DepartmentJl]) 

where  a  is  the  intensional  pattern  given  above.  As  shown  in  Figure  3.3g,  the 
result  of  this  expression  will  contain  the  derived  patterns  shown  in  Figure  3g 
which  are  specified  by  the  [£;7]  clause  of  the  projection  operation  and  is  reorgan- 
ized by  an  A-Integrate  operation.  Note  that  Query  3  cannot  be  phrased  in  a  sin- 
gle relational  algebra  expression  since  (a)  the  union  operation  in  relational  algebra 
requires  operands  to  be  union-compatible,  (b)  using  a  join  operation  on  Student 
can  cause  a  loss  of  information  because  not  every  student  has  both  major  and 
minor,  (c)  the  cartesian-product  of  the  majors  and  minors  will  produce  erroneous 
results,  and  (d)  no  other  operation  in  the  relational  algebra  can  combine  two  rela- 
tions into  one. 

Query  4:     For  each  teacher,  list  the  sections  which  he/she  does  not  teach. 

The  algebraic  expression  for  Query  4  can  be  easily  formulated  as  follows, 
since  it  is  represented  by  a  linear  pattern  shown  in  Figure  3.3h.  We  note  that  the 
A-Complement  operator  |,  rather  than  the  NonAssociate  operator  !,  should  be 
used  for  this  query,  since  a  teacher  may  be  teaching  some  courses. 

Teacher  \  Section 

Several  other  query  examples  are  given  below.  They  use  the  schema  graph 
given  in  Figure  3.1.  Their  corresponding  intensional  patterns  are  depicted  in  Fig- 
ure 4.6. 
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Query  5:     List  the  names  of  students  who  teach  in  the  same  departments 
as  their  major  departments. 

We  can  see  from  Figure  4.6  that  the  intensional  pattern  for  this  query  can  be 

constructed  in  two  ways.   One  way  is  to  decompose  it  into  three  linear  patterns: 

Name— Person— Student,    Student— Department,    and 
Student— Gr  ad— TA— Teacher— Department 

The  A-Intersect's  of  these  three  patterns  will  produce  a  pattern  that  satisfies  this 
query. 

n(Student  *  Person  *  Name  •  Student  *  Department 

•  Student  *  Grad  *  TA   *  Department)[Name] 

where  the  first  A-Intersect  operation  operates  over  Student  and  the  second 
operates  over  Student  and  Department.  The  A-Project  operation  projects  the 
names  of  these  students. 

Another  way  is  to  decompose  the  intensional  pattern  into  two  linear  patterns: 

Name— Person— Student— Department    and 
Student— Grad— TA-Teacher— Department 

Therefore,  we  have  an  alternative  expression 

n(Name  ^Person  ^Student  ^Department  *TA 

•  Student  *Gr  ad  #TA  *Teacher  *Department)[Name] 


Query  6:     List  the  section^  of  those  sections  which  have  not  been  assigned 
a  room  or  have  not  been  assigned    a  teacher. 

Since  the  query  requests  sections  that  have  not  been  assigned  a  room  or  a 

teacher,  these  sections  must  not  be  connected  with  any  room  or  any  teacher  (i.e., 
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a  section  which  does  not  associate  with  any  room  and  teacher  should  also  be 
retained  in  the  result).  Therefore,  there  should  be  Complement-patterns  between 
Section  and  Teacher  and  between  Section  and  Room,  and  a  single  arc  between 
these  two  branches  as  shown  in  Figure  4.6.  We  emphasize  that  !  operation, 
instead  of  |,  should  be  used  to  construct  these  two  Complement-patterns.  Then 
the  algebra  expression  for  this  query  can  be  easily  formulated  as  follows: 

II  (Section#  *  (Section  !  Room#  +  Section  \Teacher))[Section#\ 

Query  7:     List  the  names  of  students  who  take  courses  6010  and  6020. 

We  shall  show  three  ways  of  formulating  an  expression  for  this  query.  First, 
the  intensional  pattern  for  Query  5  shown  in  Figure  4.6  can  be  constructed  by  the 
A-Intersect  of  two  linear  patterns  as  we  did  for  Query  5: 

II{o{Name  ^Person  ^Student  ^Enrollment  ^Course  *Course#)[Couree#=6010] 
•    o(Student*Enrollment-.l  *Course-l  *Course#-l)[Course#=6020})[Name] 

where  Enrollment—1,  Course_l,  and  Course#_l  are  the  aliases  of  the  classes 
Enrollment,  Course,  and  Course^,  respectively.  This  ensures  that  the  A-Interact 
operation  will  be  performed  only  over  the  Student  class. 

A  second  way  is  to  view  the  original  pattern  as  a  linear  pattern  without  res- 
triction on  Course#  as  follows: 

Name— Person— Student— Enrollment— Course— Courseft 

Students  who  are  taking  both  courses  must  participate  at  least  two  such  patterns 
with  Course:#=6010  and  Course $=6020,  respectively.  This  implies  an  A-Divide 
operation.   Thus,  the  query  can  be  formulated  as  follows: 
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II(Name  *Person  *Student  *Enrollment  *Course  *Course# 

^{student}  <7(<?ourae.Course#)[Course#=6010VCour«e#=6020])[2Vame] 

where  a  dot  in  Course.Course#  is  used  only  for  identifying  the  Course#  class 
which  is  defined  in  the  Course  class.  It  does  not  represent  a  function  or  a  method 
as  in  other  languages.   This  expression  can  also  be  rewritten  as  follow: 

Il{Name   *  Person  *  II(Student  *  Enrollment  *  Course  *  Course# 

^{student}  ^[Course.Courseif)[Cour8e#=mi0yCour8eH^=&020\)\Student\)[Name\ 

which  is  more  suitable  for  execution  than  the  first  since  the  inner  A-Project  gets 
the  student  objects  who  are  taking  these  two  courses  so  that  all  other  data  associ- 
ated with  these  students,  such  as  Enrollment,  Course,  and  Course^,  do  not  have 
to  be  carried  along  in  further  processing  to  get  the  names  of  these  student. 
Details  of  optimization  issues  will  be  addressed  in  the  next  chapter. 

We  stress  that  the  above  association  pattern  expressions  represent  the  inter- 
nal algebraic  operations  that  need  to  be  performed  if  the  dynamic  inheritance 
method  is  used.  The  high-level  query  statements  corresponding  to  these  algebraic 
expressions  issued  by  the  user  can  be  much  simpler  due  to  the  inheritance  of  attri- 
butes in  the  generalization  hierarchy  or  lattice. 
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Figure  4.1   Regular-edges  and  Complement-edges  in  an  OG 
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(b)     complex  association  patterns 


Figure  4.2   Examples  of  association  patterns 
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Figure  4.3  Examples  of  association-sets 
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Figure  4.4  A  sample  database  association  graph 
(The  Complement-patterns  are  not  shown) 
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(a)  an  Associate  operation 


Figure  4.5   Example  of  operations 
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(The  Complement-patterns  are  not  shown) 
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(b)  an  A-Complement  operation 


Figure  4.5~continued 
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Sample  Database 


(The  Complement-patterns  are  not  shown) 
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(c)  an  A-Project  operation 


Figure  4.5~continued 


85 
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(The  Complement -patterns  are  not  shown) 
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(d)  a  NonAssociate  operation 


Figure  4.5-continued 
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Sample  Database 


(The  Complement-patterns  are  not  shown) 
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(e)  an  A-lntersect  operation 


Figure  4.5— continued 
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(The  Complement-patterns  are  not  shown) 
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(f)    A-lntegrate   operations 


Figure  4.5~continued 
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Sample  Database 


(The  Complement-patterns  are  not  shown) 
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(g)     an  A-Difference  operation 


Figure  4.5~continued 
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(h)  an  A-Divide  operation 


Figure  4.5— continued 
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Query  5 
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Enrollment   Course 

Name  Student   ^O ° c 

O O — 


Person  ^"^O O O  Course#=6020 

Enrollment_1    Course_1 


Figure  4.6   Intensional  patterns  of  Query  5,  6,  and  7 


CHAPTER  5 

MATHEMATICAL  PROPERTIES  OF  OPERATORS 

AND  THEIR  APPLICATIONS 

IN  QUERY  OPTIMIZATION  AND  QUERY  DECOMPOSITION 


In  Section  4.3,  we  have  shown  some  mathematical  properties  of  individual 
operators.  In  this  section,  we  shall  study  their  properties  systematically.  The  pro- 
perties of  A-algebra  are  classified  into  six  categories:  (1)  conventional  algebraic 
properties  such  as  commutativity,  associativity,  idempotency,  nilpotency,  and  dis- 
tributivity;  (2)  nesting  of  two  unary  operations;  (3)  a  binary  operation  nested  in  a 
unary  operation;  (4)  cascading  of  two  different  binary  operations;  (5)  general  iden- 
tities; and  (6)  operation  transformation.  The  properties  presented  in  this  disserta- 
tion is  quite  exhaustive,  but  may  not  be  complete.  These  properties  provide  the 
mathematical  foundation  for  query  decomposition  and  query  optimization.  Their 
utilities  in  these  two  applications  are  also  illustrated  in  this  chapter.  The  proofs  of 
properties  that  are  marked  with  f's  can  be  found  in  the  Appendix.  Others  can  be 
proved  similarly. 

hJ Conventional  Algphraic  Propprtips 

To  be  systematic,  first  we  list  the  properties  given  in  Section  4.3  without 
explanation,  since  they  have  been  illustrated  previously.  Then,  we  give  the  pro- 
perties of  distributivity. 
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A.  Commutativity 

a  *\R(A,B)\  P  =  P  *{R{B,A)\  a  (5.1  f  ) 

a  |  [R(A,B)}  p=p\  \R{B,A)}  «  (5.2  f  ) 

or  !  [R(A,B)\  P  =  p\  [R{B,A)\  a  (5.3  f  ) 

a  .{W}  p  =  p  .{W}  a  (5.4  J) 

a  +  P  =  P  +  a  (5.5  f  ) 

B.  Associativity 

(aw  *[i?(A,B)]  /?{y})  *[/?(<?,£>)]  7{z} 

=  <*{X)  <W*M  (£{y)  *[fi(C?,D)]  7{2})        {C%{Xl  A  Bg{Z»  (5.6  f  ) 

(orw  |  [J2(A,B)]  /?m)  |  [JJ(C,2>)]  7{z} 

=  aW  I  \R{A,B)\  (p{Y)  |  [*(C,Z>)]  1{z})         (C£{X)  A  £g{Z})  (5.7  f  ) 

("<*>  -TO  /*{y>)  •{W2}  1{z}  =  «w  .  {WJ  (/?{y}  .{W2}  7{z}) 

WWiHW«0nW-*  A  ({wj-{Wi})f|W  =  ^)  (5.8  f) 

(a  +  p)  +  7  =  «  +  (P  +  7)  (5g  |  j 

C.  Idempotency  and  Nilpotency 

a  •  a  =  a  (if  a  is  a  homogeneous  association— set)  (5.10) 

(5.11) 
A  *[R(A,A)}  A  =  A  (5.12) 

A  \[R(A,A)]  A  =  4>  (5.13) 


a  +  a  =  a 
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a  ■*•  or  =  a 


(5.14) 


D.   Distributivity 

a)  distributive  property  of  *  with  respect  to  +: 

a  *[R(A,B)}  (0  +  7)  =  a  *{R(A,B)}  P  +  a  *[R{A,B)]  7  (5.15  f  ) 

(b)  distributive  property  of  |  with  respect  to  +: 

a  I  \R{A,B)]  &  +  7)  -  a  I  [R[A,B)\  P  +  a  \  [R(A,B)\  7  (5.16  f  ) 

c)  distributive  property  of  •  with  respect  to  +  : 

or  .{X}  (  fi  +  7 )  -  a  •{.*}  0  +  a  »{X}  1  (5.17  f  ) 

These  three  properties  hold  true  for  the  same  reasons.  First,  the  A-Union 
operation  simply  lumps  together  patterns  of  two  association-sets  without  modify- 
ing them.  Second,  when  two  patterns  are  operated  on  by  *,  |,  or  •,  the  production 
of  a  new  pattern  is  independent  of  other  patterns  in  the  operand  association-sets, 
i.e.,  the  decision  whether  a  new  pattern  is  produced  or  not  is  determined  only 
based  on  the  structure  of  the  two  patterns  being  operated  on. 

d)  distributive  property  of  *  with  respect  to  •: 

am  *{R(CLVCL2)}  {ftm  .{W}  7{z}) 
=  «W  *[R(CLVCL2)}  0{Y)  .{WUX}  aw  *[R(CLVCL2)}  1{z)  (5.18  f  ) 

e)  distributive  property  of  |  with  respect  to  •: 

«W  I  [R{CLVCL2)\  (0{Y)  .{Wl  7{z}) 
=  aw  I  [R(CLVCL2)}  p{Y)  .{WUX}  a{x)  |  [R(CLVCL2)\  1{z}  (5.19) 
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Distributive  properties  d  and    e,  hold  true  under  the  following  three  condi- 
tions: 

i)     CL2eW; 

ii)     Xp|Y  =  Xf)Z  =  <j>  ■  and 
iii)     a  is  a  homogeneous  association— set. 

The  first  condition  ensures  that  the  *,  |,  and  !  operations  are  performed  on 
the  intersection  of  0  and  j.  Otherwise,  it  does  not  make  sense  to  have  an  opera- 
tion between  a  and  7.  The  second  condition  states  that  a  patterns  are  non- 
overlapping  with  p  and  7  patterns.  The  third  condition  states  that,  on  the  right- 
hand  side  of  the  expression,  only  the  patterns  having  the  same  a  patterns  as  their 
sub-patterns  will  succeed  in  the  A-Intersect  operation.  Although  these  two  distri- 
butive properties  do  not  hold  when  one  of  the  above  three  conditions  is  not  true, 
they  are  equivalent  to  some  other  expressions  under  a  less  restrictive  condition. 
These  properties  are  classified  in  other  categories. 

It  should  be  noted  that  two  possible  distributive  properties  are  missing  in  the 
above  list.  First,  !  is  not  distributive  with  respect  to  +.  This  property  does  not 
exist  because  of  the  way  the  NonAssociate  operation  is  defined.  By  its  definition, 
a  pattern  in  one  association-set  will  be  included  in  the  resultant  pattern  iff  it  does 
not  connect  to  any  pattern  in  the  other  association-set.  This  implies  a  logical 
AND  concept.  Therefore,  expressions  a  !  (p  +  7)  and  a  !  p  +  a  !  7  have  totally 
different  semantics.  The  former  stands  for  patterns  in  a  that  are  not  associated 
with  patterns  in  both  p  and  7;  whereas  the  latter  specifies  those  patterns  in  a  that 
are  not  associated  with  any  pattern  in  either  fi  or  7.  Second,  !  is  not  distributive 
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with  respect  to  ..  This  property  does  not  hold  because  performing  the  A-Intersect 
operation  first  may  drop  some  P  patterns  which  may  be  associated  with  some  a 
patterns  and  the  dropped  p  patterns  may  allow  those  a  patterns  to  be  non- 
associated  with  the  result  of  the  A-Intersect  operation.  Whereas,  when  perform- 
ing the  Nonassociate  operation  first  those  a  patterns  may  not  appear  in  the  final 
result. 

The  reason  that  NonAssociate  operator  is  not  distributive  with  respect  to  A- 
Union  and  A-Intersect  operations  is  mainly  because  it  is  not  associative.  We  shall 
see  from  the  rest  of  this  chapter  that  it  has  less  properties  than  other  operators. 

^2 Nesting  of  Two  Unary  Operations 

a)   Two  A-Select  operations  (one  nested  in  the  other): 

Similar  to  the  relational  algebra,  the  order  of  the  nesting  of  two  selections 
can  be  exchanged  without  affecting  the  final  result.  Or,  they  can  be  combined 
into  a  single  selection  operation.  The  selection  condition  of  the  combined  A-Select 
operation  is  the  conjunction  of  the  predicates  of  the  original  two  A-Select  opera- 


tions. 


<M  «M*J)fl]  =  "2(  •MWJ  (5.20  f ) 
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b)  Two  A-Project  operations  (one  nested  in  the  other): 

It  should  be  obvious  that  the  order  of  the  nesting  of  two  projection  opera- 
tions cannot  be  exchanged  except  that  they  project  the  same  thing,  which  is  not 
meaningful.  However,  they  are  equivalent  to  a  single  projection  if  the  outer  A- 
Project  operation  projects  subpatterns  over  patterns  produced  by  the  inner  A- 
Project. 

fli(  ma^TfifaV  =  WfoTJ  (5.21) 

(  Vei.3e2/ei.-€^  A  ejye£2  A  eltCcy)  ) 

where  eu's  are  subpattern  expressions  of  the  first  A-Project  operation  and  e^-'s  are 
subpattern  expressions  of  the  second  A-Project  operation;  and  eHCey  means  that 
eu  defines  a  subpattern  of  e^. 

c)  Two  A-Integrate  operations  (one  nested  in  the  other): 

By  the  definition  of  the  A-Integrate  operation,  if  an  A-Integrate  operation  is 
applied  second  time  on  an  association-set,  it  will  have  no  effect  on  the  result  of  the 
first  operation.   Therefore,  we  have 

/(  J(a))  =  J(a)  (5.23) 

Since  an  A-Integrate  operation  with  a  set  of  specified  classes  only  performs  part  of 
the  function  of  an  A-Integrate  operation  without  a  set  of  specified  classes,  the  fol- 
lowing equations  also  hold  true. 

/(  (./•»  =  /(«)  (5.24) 
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f{w{  /(*»  =  /(«)  (5-25) 


d)  A-Select  nested  in  A-prqject,  or  vise  versa: 

A  selection  operation  performed  on  the  result  of  a  projection  operation  is 
equivalent  to  the  projection  performed  on  the  result  of  the  selection,  since  the 
selection  condition  applicable  to  the  projected  subpatterns  must  be  applicable  to 
the  patterns  before  the  projection.   However,  it  is  not  true  for  the  other  direction. 

o(  ma)[€-MJ\  -  IA  o{a)[Pi)[eiT\  (5.26) 

For  the  other  direction  to  be  true,  the  classes  involved  in  the  predicate  of  the 
selection  condition  should  also  appear  in  [£;7]  clause  of  the  projection  operation 
(denoted  as  PCS)  which  defines  subpattern(s)  to  be  projected  out.  Otherwise,  the 
result  of  the  selection  is  always  an  empty  set  because  the  predicate  is  not  applica- 
ble to  the  projected  patterns.  Therefore,  the  above  property  holds  true  for  both 
directions  when  the  condition  holds,  thus  we  have 

II{  a{a)[Pi)\eri\  =  of  H«)[ftTDM     [PCS]  (5.27  f  ) 

L2 A  Binary  Operation  Nested  in  A  Unary  Operation 

5.3.1    Binary  operation  nested  in  an  A-Select, 

a)  Associate,  A-Complement,  or  A-Intersect   nested  in  A-Select 

Generally  speaking,  transforming  an  expression  of  a  binary  operation  (Associ- 
ate, A-Complement,  or  A-Interact)  nested  in  a  selection  into  another  expression  is 
impossible,  since  the  predicate  of  the  selection  operation  can  be  very  complicated. 
For  this  reason,  we  study  only  the  simple  case  in  which  the  predicate  has  the  form 
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PxaP2  or  Px\iPv  and  Px  and  P2  are  only  applicable  to  a  and  P,  respectively.   The  fol- 
lowing properties  are  similar  to  those  in  relational  algebra.    They  do  not  need  an 
explanation. 
For  PxaP2,  we  have 

o(a  *{R(A,B))  p)[PlAP2}  =  a^a)^]  *[R(A,B)}  a2(p)[P2]  (5.28  f  ) 

a{a  \[R(A,B)}  P)[PXAP2\  =  *,(«)[/>]  \R{A,B)}  \  a2(p)[PJ  (5.29) 

a{a  .  P)[Px/kP2]  =  <71(ar)[/>]  .  a2{p)[P2]  (5.30) 

For  PY\/P2,  we  have 

ot«  *[«(A,B)]  fl^V/y  =  oto)[?,]  *{i2(AlB)]  £  +  a  *[R{A,B))  o{p)[P2\  (5.31  f  ) 

o{a  \[R(A,B)}  P)[PXVP2]  =  *(«)[/>]  |i?(A,B)]  /?  +  a  \[R(A,B)}  o{p)[P2)  (5.32) 

^  •  flftvjy  =  «(a)ft]  .  £  +  a  .  o(^[^|  (5.33) 

We  note  that  the  above  properties  are  not  true  for  a  NonAssociate  operation 
nested  in  an  A-Select.  The  reason  is  similar  to  what  we  have  explained  in  the  sec- 
tion on  distributive  property. 

b)  A-Difference  nested  in  A-Select 

Since  both  A-Difference  and  A-Select  operations  perform  a  restriction  on  an 
association-set  and  produce  a  subset  of  patterns  without  changing  their  original 
structures,  an  A-Select  operation  performed  on  the  minuend  or  on  the  result  of  the 
A-Difference  operation  will  produce  the  same  result. 

o{a  -  p){P1  =  o(or)[^  -  p  (5.34  j ) 
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c)  A-Union  nested  in  A-Select 

It  should  be  obvious  that  the  following  equation  is  always  true: 

aya  +  fl[P[  =  o{a)[Pi  +  a{m  (5-35  f  ) 

In  a  special  case  that  P  has  the  form  P1vP2  and  P,  and  P2  can  be  applied  to  a  and 
P,  respectively,  we  have 

o(a  +  p)\Px\,P2}  =  a^W  +  aMPJ  (5.36  f  ) 

5.3.2   Binary  operation  nested  in  A-Project  or  A-Tntegrate 

Since  A-Project  and  A-Integrate  operations  produce  patterns  which  may  con- 
tain subpatterns  of  both  operands  of  the  nested  binary  operation,  properties  simi- 
lar to  those  presented  above  do  not  hold  in  general  except  for  the  nesting  of  an 
A-Union  operation. 

Il{a  +  0)[f;TJ  =  IJ{a)[£;7\  +  fl(0[£;T]  (5.37  f ) 

/(*  +  fii  -  /(  /(«)  +  jm  (5.38) 


f    (a  +  p)  =  /    (  /    (a)  +  /    (ft)  (5.39) 

J{wy  '      \w\  \w\  '      J{wy  "  v       ; 


hA Cascading  of  Two  Binary  Operations 

5.4.1    Cascading  of  two  identical  binary  operators 

Most  cases  have  been  covered  by  the  associativity  properties.  Although  the 
associativity  does  not  hold  for  operators  -  and  -f,  there  exist  some  equivalent 
expressions.     The    cascading   of   two   A-Difference    operations   follows   the    set- 
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difference  in  set  theory. 


ot-P-i  =  a-i-P  =  a-(P  +  i) 


(5.40  f ) 


The  cascading  of  two  A-Divide  operations  is  equivalent  to  the  dividend 
divided  by  the  A-Union  of  the  two  divisors  because  an  A-Divide  operation  retains 
patterns  of  the  dividend  without  modifying  their  structures  (note  that  the  divide 
operation  in  relational  algebra  retains  a  substructure  of  the  dividend).  Therefore, 
the  order  of  the  two  A-Divide  operations  is  not  important. 


(5.41  f  ) 


5.4.2    Cascading  of  two  different  binary  operations 

Many  cases  have  been  covered  by  the  distributive  properties.  Although  the 
distributivity  properties  of  !  and  -^  with  respect  to  +  do  not  hold,  there  still  exist 
some  equivalent  expressions.  These  properties  are  listed  below  according  to  their 
first  operators. 


a)  *  with  other  binary  operators 

The  cascading  of  *  and  |  operators  is  associative. 

(orw  *[i2(A,B)]  P{Y))  \[R(C,D)}  1{z)  =  a{X}  *[R{A,B)}  (/?{y}  \{R{C,D)\  7{z})    (5.42  f  ) 

(C<t{X}AB<t{Z}) 

The  condition  ensures  that  the  operation  *[R(A,B)}  does  not  operate  on  7  patterns 
and  *[R(C,D)}  does  not  operate  on  a  patterns. 
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For  the  cascading  of  *  and  -  operators  (in  that  order),  it  should  be  obvious 
that  when  the  subtrahend  is  only  applicable  to  one  of  the  operands  of  the  *  opera- 
tion, the  -  operation  can  be  performed  first  and  just  against  that  operand. 

(orw  *[R(A,B)}  p{Y))  -  7{z}  =  («w  -  7{z})  *[R{A,B)}  fa       ({Y}fJZ}  =  #5.43  f  ) 

=  «W  4&AM  iP{Y)  ~  T{z})       (W^  =  *) 

For  a  similar  reason,  the  following  property  hold  true. 

(«W  *W*M  fa)  •  7{z}  =  («W  •  l{z))  *[R{A,B))  P{Y}  (5.44  f  ) 

({Y}p|{Z}=<»  A  mnW  =  <t>  A  A€W) 

=  «{XJ  *[«(^,-B)]  (/?{y)  •  7{z}) 
({JK)f){2W  A  {X>n(Y)=0  A  BG{y» 

The  first  two  conditions  ensure  that  7  patterns  do  not  intersect  with  a  and  P  pat- 
terns. Otherwise,  the  A-Intersect  operation  will  perform  over  the  common  classes 
of  P  and  7  if  the  *  operation  is  performed  first.  The  third  condition  ensures  that  or 
(P)  must  contains  object  instances  of  A  (B).  In  other  words,  the  algebraic  expres- 
sion that  defines  a  (P)  must  contain  A  (B).  Otherwise,  performing  the  A-Intersect 
operation  first  may  produce  false  result  when  7  contains  object  instances  of  A. 
Note  that  the  right-hand  side  of  the  equation  is  in  a  distributive  form  of  *  with 
respect  to  •.  However,  the  distributive  property  cannot  be  applied,  since  it 
requires  that  A  belong  to  a  and  P,  and  that  7  be  a  homogeneous  association-set 
(refer  to  Section  5.1). 
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b)  |  with  other  binary  operators 

Similar  to  the  above  two  properties,  we  have 

(a{X}  \[R(A,B)}  fi{y})  -  1{z]  =  (a{X}  -  7{z})  \{R(A,B)}  P(V)       ({Y}fJZ}  =  <j>)     (5.45) 

=  am  \{R(A,B)}  (fim  -  1{z})       ({X}fJZ}  =  <j>) 


(«W  \WAM  P{Y))  •  7{z}  =  («W  •  7{z})  \[R(A,B)}  P{Y) 

({Y\fJZ}=<t>  A  {Y}f^X}  =  <f>  A  A(E{X}) 

=  «W  l[«(A,B)]  (0{Y}  •  7{z}) 

c)  •  with  other  binary  operators 

Similar  to  equations  5.43  and  5.45,  we  have 

(<*{x)  •  P{y))  ~  \z)  =  (««  -  1{z})  •  P{Y)      ({^fR  =  *) 
=  «W  •  (0{y>  -  T{z})       (WfR  =  *) 


(5.46) 


(5.47) 


d)  !  with  other  operators 

As  we  have  mentioned  earlier,  the  !  operator  has  less  properties  because  it  is 
not  associative.  Although  !  is  not  distributive  with  respect  to  +,  the  following 
decomposition  holds  true: 

a  \[R(A,B)]  {p  +  7)  (5.48  f  ) 

=  a\[R(A,B)}P-Il(a*lR(A,B)]l)[<x}  +  tt![i?(A,fl)]7-iJ(tt*[fl(A,fl)]jJ)[a] 


or      a  \{R(A,B)}  [p  +  7) 

=  (<*-n{a<{R(A,B)}P)[<*}-n(<**lR(A,B)}l)l<*}) 

\[R(A,B)}  {{p-n\a*{R{A,B)]m)  +  h~n{» *[R(A,B)}l)H)) 

where  a,  /?,  and  7  are  homogeneous  association-sets. 


(5.49) 
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The  significance  of  equations  5.48  and  5.49  is  that  they  can  be  used  to 
transform  the  original  expressions,  in  which  the  !  operators  operate  on  heterogene- 
ous association-sets  (e.g.,  a+p  )  for  which  the  distributivity  cannot  be  applied,  into 
expressions  in  the  format  of  A-Union's  of  homogeneous  association-sets. 

e)  -f  with  other  operators 

An  association-set  (a)  divided  by  the  A-Union  of  two  other  association-sets  (/? 
and  7)  is  equivalent  to  two  consecutive  A-Divide  operations  of  a  divided  by  P  and 
7  in  turn  as  indicated  in  equation  5.41.  The  order  of  the  two  A-Divide  operations 
is  not  important. 


=  a  +{w)  1  +{w)  P 


(5.50) 


The  A-Divide  operator  also  has  less  properties  because  it  is  not  associative. 

f)  -  with  other  binary  operators 

The  properties  of  operator  -  cascaded  with  other  operators  are  covered  by 
5.43,  5.45,  and  5.47. 


g)  +  with  other  binary  operators 

The  equation  below  follows  the  set-union  and  set-difference  operations  in  set 
theory. 


(a  +  ft  -  7  -  (a  -  t)  +  (fi  - ,7) 


(5.51  f ) 
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The  properties  of  cascading  of  +  with  operators  *,  |.  •,  and  !  operators  can  be 
found  in  5.15,  5.16,  5.17,  5.48,  and  5.49,  since  the  latter  operators  are  commuta- 
tive. 

5Ji General  Identities 

There  are  many  other  properties  which  are  unique  to  the  A-algebra  but  can- 
not be  classified  into  the  above  categories.  Listed  below  are  some  identity  proper- 
ties. These  identities  are  useful  for  expression  reduction. 

A  •  A  *  B  =  A  *  B  (5.52) 

A  .  A  !  B  =  A  !  B  (5.53) 

A  +  I1(A\B)[A]  =  A  (5.54) 

A  *  B  *  C  •  A  *  B  =  A  *  B  *  C                                                                 (5.55) 

5J> Transformation  of  Operators 

An  important  fact  we  have  observed  is  that  the  same  pattern  can  be  con- 
structed by  different  algebraic  expressions  using  different  operators.  For  example, 
pattern  A — B — C  can  be  constructed  either  by  A*B*C or  by  B*A  •  B*C,  hence 


B*A»B*C=A  *  B  *  C  (5.56) 


Formally,  their  equivalence  can  be  derived  using  the  properties  presented  in 


the  previous  sections: 


B  *  A  •  B  *  C 

(B  .  £  *  O  *[R(B,A)}  A  (by  5.44) 

(B  *  O)  *[R{B,A)\  A  (by  5.52) 
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=  A  *  (B  *  C)  (by  5.1) 

=  A*B*C  (by  5.6) 

For  the  other  direction,  we  have 

A  *  B  *  C 

=  A*(B*B)*C  (by  5.10) 

=  A  *  (B  •  B)  *  C  (by  5.10,  5.12) 

=  (A  *  B  .  B)  *  C  (by  5.44) 

=  A  *  B  .  B  *  C  (by  5.44) 

Using  this  property,  a  pattern  of  tree-structure  can  be  described  without  using  A- 
Intersect  operator,  which  is  relatively  more  expensive  to  implement.   For  example, 

A*(B*C»B*D) 

=  A  *{R(A,B)\  {C  *B  *D)  (by  5.56) 

=  A*(B*C  *[R(B,D)}  D)  (by  5.1,5.6) 

=  A  *  B  *  C  *[R{B,D)\  D  (by  5.6) 

Another  useful  transformation  is  possible  because  a  pattern  of  lattice  struc- 
ture expressed  by  an  intersection  of  two  linear  patterns  can  be  viewed  as  a  selec- 
tion on  linear  patterns  to  avoid  the  expensive  A-Intersect  operation.   For  example, 

A*B*C*D  •  B*E*D  =  a  {A *B *C *D *E *B-1)[B=B-A}.  (5.57) 

The  left-hand  side  is  to  construct  a  lattice  pattern  by  intersecting  two  linear  pat- 
tern over  classes  B  and  D.  By  breaking  the  lattice  pattern  at  B,  it  becomes  a  sin- 
gle linear  pattern  as  seen  on  the  right-hand  side  of  the  above  expression.  Here,  £_l 
is  an  alias  of  B.  By  specifying  that  B=B-l  in  the  the  association-set  defined  by 
A*B*C*D*E*B-l,  we  obtain  the  same  result  as  the  expression  defined  on  the  left- 
hand  side. 
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Based  on  these  two  transformation  properties,  a  complicated  network  struc- 
ture can  be  viewed  as  a  forest  structure  by  properly  breaking  all  the  loops  in  the 
network  and  its  algebraic  expression  can  be  specified  using  a,  *,  |,  and  !  operators. 

■01 Applications  in  Qnerv  Optimization  and  Qnerv  Dprom position 

We  have  systematically  presented  the  mathematical  properties  of  the  opera- 
tors of  A-algebra.  In  this  section,  their  utilities  in  query  optimization  and  query 
decomposition  will  be  illustrated. 


5.7.1    Applications  in  query  optimization 

Generally,  query  processing  consists  of  three  phases:  translation,  optimization, 
and  execution.  A  query  issued  by  the  user  is  in  the  form  of  high-level  language. 
First,  it  is  translated  into  an  internal  representation  —  an  access  plan,  which  may 
not  be  efficient  for  execution.  Then,  the  optimizer  generates  a  new  access  plan 
which  is  equivalent  to  the  original  access  plan  (i.e.,  they  produce  the  same  result) 
and  is  "optimal"  for  execution.  Finally,  the  new  access  plan  is  scheduled  for  exe- 
cution by  the  transaction  manager  to  produce  the  result  of  the  query.  Since  it  is 
difficult  to  determine  the  equivalence  of  two  statements  in  a  high-level  language, 
alternative  access  plans  cannot  be  generated  by  the  query  translator.  In  relational 
databases,  the  access  plan  generated  by  the  query  translator  is  in  the  form  of  a 
query  tree  in  which  algebra  operators  are  used  in  the  relational  databases  so  that 
the  mathematical  properties  can  be  used  to  generate  equivalent  access  plans,  even 
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if  the  high-level  language  is  based  on  the  relational  tuple  calculus  or  domain  cal- 
culus (refer  to  Chapter  2). 

Query  optimization  is,  without  loss  of  generality,  an  NP-hard  problem. 
Therefore,  an  access  plan  generated  by  the  optimizer  is  optimal  in  a  very  restric- 
tive sense.  Furthermore,  to  be  practical,  the  overhead  of  the  optimizer  should 
never  exceed  the  advantage  of  query  optimization.  In  general,  a  query  optimizer 
generates  an  optimal  access  plan  in  two  steps:  (1)  generate  (limited  number  of) 
equivalent  access  plans,  and  (2)  evaluate  these  access  plans  based  on  (a  few)  sys- 
tem parameters  and  criteria. 

The  mathematical  properties  of  the  A-algebra  presented  above  are  the  foun- 
dation for  the  first  step  of  query  optimization  in  O-O  databases.  In  the  second 
step,  the  system/application  chooses  one  or  more  of  the  following  as  the  goal  of  its 
query  optimization:  minimal  response  time,  minimal  execution  time,  minimal  com- 
munication time,  minimal  storage  space,  maximal  resource  utilization,  etc.  The 
parameters  used  in  estimating  the  performance  of  an  access  plan  include  commun- 
ication cost  (per  block),  CPU  cost  (per  unit),  I/O  cost  (per  I/O),  buffer  size,  selec- 
tivities  of  operations  (e.g.,  Selection  and  Join  in  relational  databases),  data  struc- 
ture, algorithms  of  the  operations  (e.g.,  nested-join,  hash-join),  etc. 

Since  the  criteria  of  optimization  are  system/application  dependent  and  the 
optimization  strategies  vary  from  system  to  system,  a  detailed  study  is  out  of  the 
scope  of  this  dissertation.  We  shall  give  an  example  to  demonstrate  the  impor- 
tance of  the  A-algebra  in  query  optimization. 
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Query   8:       List    GPAs   of  students  who   major    and   minor   in   the   same 
departments. 

The  intensional  pattern  for  this  query  is  shown  in  Figure  5.1a.    Suppose  that 

the   algebraic  expression  produced  by  the  query  translator  is  as  follow,  which 

corresponds  to  an  access  plan  represented  by  the  query  tree  shown  in  Figure  5.1b. 

II(GPA  *  (Student  *  Department  •  Student  *  Under grad  *  Department))[GPA] 

To  make  the  evaluation  easy,  we  assume  that  every  student  has  major,  minor, 
and  GPA  (i.e.,  the  selectivities  of  all  *  operations  are  1.0)  and  100  out  of  104  stu- 
dents major  and  minor  in  the  same  departments  (i.e.,  the  selectivity  of  the  • 
operation  is  1/102).  If  the  time  to  perform  an  A-Select  on  a  pattern  is  1  unit,  to 
perform  an  Associate  operation  is  2  units,  and  to  perform  an  A-Intersect  operation 
is  5  units,  the  total  execution  time  can  be  calculated  as  follows  not  including  time 
for  the  A-Project  operation: 

T,  =  (2*104)  +  (4*104)  +  (5*104)  +  200  =  11.02  *104 

where  the  first  term  is  the  time  for  identifying  students'  majors,  the  second  term 
is  for  identifying  students'  minors,  the  third  term  is  for  the  A-Intersect  operation, 
and  the  last  term  is  for  identifying  the  GPAs.  In  Figure  5.1b,  the  costs  of  opera- 
tions are  depicted  next  to  the  operator  nodes.  Here,  the  time  for  the  A-Intersect 
operation  is  small  because  each  student  has  only  one  major  and  one  minor  and 
indices  may  be  used  to  speed  up  the  operation. 

Using  property  5.57,  the  same  intensional  pattern  can  be  viewed  as  a  linear 
patter  shown  in  Figure  5.2a,  and  thus,  the  optimizer  generates  a  new  algebraic 
expression,  which  corresponds  to  the  access  plan  shown  in  Figure  5.2b. 
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Il{a{GPA   *  Student  *  Department  *  Under grad  *  Student-l) 

\Student=StudentA])[GPA} 

The  total  execution  time  for  this  access  plan  is 

T2  =  (8*104)  +  (104)  =  9*104 

where  the  first  term  is  the  time  for  four  Associate  operations  and  the  second  term 
is  the  time  for  the  selection  operation.  It  is  less  expensive  than  the  original  access 
plan,  thus,  a  better  plan. 

However,  if  we  assume  that  the  database  is  a  distributed  one  in  which  data 
of  students'  GPAs  are  in  site  1  and  other  data  are  in  site  2  (the  class  Student  has 
to  be  replicated  in  both  sites).  The  communication  cost  is  assumed  to  be  1000 
units  per  block  with  block  size  of  100  patterns.  The  total  execution  times  for 
these  two  access  plans  can  be  calculated  as  follows: 

r,  =  (2*104)  +  (4*104)  +  (5*104)  +  1000  +  200  =  11.12  *104 
T2  =  (8*104)  +  (104)  +  106  =  19*104 

In  Tv  the  fourth  term  is  the  communication  cost  for  sending  qualified  students  to 
site  1.  In  T2,  the  third  term  is  the  communication  cost  (the  communication  costs 
are  the  same  for  sending  GPAs  of  all  students  to  site  2  and  for  sending  students' 
majors  and  minors  to  site  1).  In  this  case,  the  first  access  plan  is  better  than  the 
second.  Figure  5.3a  and  5.3b  depicts  the  costs  of  operations  (next  to  the  opera- 
tions) and  the  costs  of  communications  (on  the  edges)  for  these  two  access  plans. 

The  optimizer  of  the  distributed  system  may  generate  another  access  plan  by 
applying  property  5.28  to  the  algebraic  expression  of  the  second  access  plan,  and 
we  have 
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II(GPA   *  o(Student  *  Department  *  Under gr ad  *  Students.) 

[Stude  nt=Stude  nt_l] )  [  GPA] 

which  corresponds  to  the  access  plan  shown  in  Figure  5.3c.  The  total  execution 
time  for  this  access  plan  is 

T3  =  (6*104)  +  104  +  104  +  200  =  7.12  *104 

where  the  first  term  is  the  time  for  the  three  Associate  operations  nested  in  the 
A-Select,  the  second  term  accounts  for  the  selection  operation,  the  third  term 
accounts  for  the  communication  cost,  and  the  last  term  is  the  time  for  getting 
GPAs.   Therefore,  the  third  access  plan  is  the  optimal  one  for  execution. 

5.7.2    Applications  in  query  decomposition 

The  O-O  modeling  techniques  incorporate  many  high-level  features  such  as 
association  types,  inheritance,  behavioral  properties  of  objects,  knowledge  and 
rules,  etc.  in  the  DBMS.  These  features  were  taken  care  of  by  database  adminis- 
trators and  application  programs  in  conventional  databases  systems.  To  ensure 
good  performance,  O-O  DBMSs  need  the  support  of  parallel  and  distributed  pro- 
cessing techniques. 

In  distributed  and  parallel  processing  environment,  a  query  is  decomposed 
into  subqueries  according  the  processing  capabilities  of  processors  and/or  data  dis- 
tribution. The  algebraic  representation  of  a  query  can  be  manipulated  mathemat- 
ically for  this  purpose.  For  example,  suppose  a  query  is  represented  by  an  inten- 
sional  pattern  shown  in  Figure  5.4a.  The  algebra  expression  for  this  query  can  be 
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written  as  follows: 

expr  =  A  *{B*E*F  +  B*(C*D*H  •  C*G)). 

By  applying  the  distributivity  properties,  the  above  expression  can  be  written  as 
below: 

expr  =  A  *{B*E*F  +  B*C*D*H  •  B*C*G) 
=  A*B*E*F  +  A  *  {B*C*D*H  •  B*C*G) 
=  A*B*E*F  +  A*B*C*D*H  •  A*B*C*G. 

The  decomposed  expression  is  the  A-Union  of  two  sub- expressions  representing 
two  sub-patterns  shown  in  Figure  5.4b.  These  sub-expressions  are  independent  of 
each  other  and  can  be  processed  in  parallel  in  a  parallel  system.  The  second  sub- 
expression can  be  further  optimized  as  shown  in  the  following  expression  in  which 
*{R(C,G)}  indicates  that  the  Associate  operation  is  performed  through  the  associa- 
tion between  C  and  G. 

expr  =  A*B*E*F  +  (A*B*C*D*H)  *[R(C,G)}  G. 

In  addition,  since  each  sub-expression  represents  a  homogeneous  association-set,  its 
processing  will  be  more  efficient  than  processing  over  heterogeneous  association- 
sets. 

Next,  we  present  two  theorems  of  the  A-algebra,  which  ensures  that  the 
decomposed  sub-expressions  produce  homogeneous  association-sets. 

Theorem  5.1: 

Operators  (except  A-Union  and  A-Integrate)  of  A-algebra  produce 
homogeneous  association-sets  if  their  operands  are  homogeneous 
association-set. 


112 

Proof:  This  is  true  by  the  definitions  of  the  operators  (A-Intersect  operation  should 
be  used  without  specifying  the  classes  on  which  the  A-Intersect  operation  is  per- 
formed, i.e.,  it  performs  on  the  common  classes  of  its  operands).  Note  that,  for 
A-Difference  and  A-Divide  operations,  this  is  also  true  if  only  the  first  operand 
(the  minuend  or  the  dividend)  is  a  homogeneous  association-set. 

Theorem  5.2: 

If  an  A-algebra  expression  which  does  not  contain  A-Integrate  opera- 
tion and  A-Divide  operation  whose  dividend  is  an  heterogeneous 
association-set,  it  can  be  decomposed  into  the  A-Union's  of  some  sub- 
expressions, each  of  which  produces  a  homogeneous  association-set. 

Proof:  According  to  Theorem  5.1,  besides  the  A-Integrate  operation,  the  A-Union 
is  the  only  operator  that  can  produce  heterogeneous  association-set  when  its 
operands  are  homogeneous  association-sets.  Therefore,  it  suffices  to  prove  that 
whenever  such  heterogeneous  association-set  appears  in  an  expression,  the  expres- 
sion can  be  decomposed  into  the  A-Union  of  sub-expressions  which  produce  homo- 
geneous association-sets. 

Proof:  Let  a,  fi,  7,  and  X  be  all  homogeneous  association-sets.  By  properties  5.15, 
5.16,  5.17,  5.35,  and  5.37  we  have 

{a  +  p)*{7  +  X)  =  or*7  +  a*\  +  £*7  +  £*X 

(«  +  01(7  +  X)  =  *fr  +  «|X  +  fih  +  0|X 
{a  +  /?).(7  +  X)  =  or»7  +  <*.X  +  £.7  +  £.X 

o[a  +  m  =  a{a)[Pi  +  00M 

II{a  +  p)[£;T\  =  II{a)[£;T]  +  77(/?)[£;7J 

By  properties  5.56,  we  have 

(a  +  p)  -  7  =  (or  -  7)  +  [0  -  7) 
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By  properties  5.42,  we  have 

(a  +  0«(7  +  X) 
=     (olTf  -  II{a*\){a}  -  I1{P*i){1\) 
+  (o!\  -  fl(ar*7)[a]  -  n{P*\){\]) 
+  (fa  -  IAP*\)[p\  -  77(a*7)[7]) 
+  (0X  -  n{P*i)\p\  -  ma *)[X]) 

In  the  above  decompositions,  each  term  of  the  A-Union  operations  represents  a 
homogeneous  association-set.   □ 
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Figure  5.1   Access  plan  1  of  Query  8 
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Figure  5.2  Access  plan  2  of  Query  8 
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Figure  5.3  Costs  in  a  distributed  system 
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Figure  5.4  Example  of  query  decomposition 


CHAPTER  6 
COMPLETENESS  OF  THE  A-ALGEBRA 


We  have  shown  in  the  preceding  sections  that  a  query  issued  against  an  0-0 
database  can  be  specified  by  an  association  (or  graphic)  pattern,  in  which  object 
instances  of  interest  are  related  (associated  or  nonassociated),  and  that  the  A- 
algebra  provides  a  useful  mathematical  method  for  specifying  and  manipulating 
such  pattern  to  produce  the  result  for  the  query.  However,  for  the  algebra  to  be 
truly  useful,  the  completeness  of  the  algebra  needs  to  be  addressed. 

Due  to  the  closure  property  of  the  A-algebra,  the  result  of  a  query  is 
represented  intensionally  by  a  subdatabase  schema  graph  SGf  and  extensionally  by 
a  subdatabase  object  graph  OGt,  where  SG,  is  a  subgraph  of  the  SG  of  the  origi- 
nal database  and  OGt  is  a  subset  of  association  patterns  in  the  original  object 
graph  OG.  A  subdatabase  can  be  further  operated  upon  by  the  A- algebra  opera- 
tors to  produce  other  subdatabases.  We  can  therefore  define  the  completeness  of 
the  algebra  in  the  following  way. 
Completeness  Theorem: 

The  A-algebra  is  complete  if  it  can  define  all  possible  subdatabase  of  an  O-O 

database. 

Before  proving  the  theorem,  we  first  give  the  formal  definitions  of  the  SGt 
and  OGt  of  of  the  subdatabases  of  an  0-0  database. 
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Subdatabase  Schema  Graph: 

A  subdatabase  schema  graph  {SGt)  is  a  set  of  m  connected  subgraphs, 
{SG't(C,A)}  (t=l,2,...,m),  from  the  original  database  schema  graph  SG(C,A), 
where  C  is  a  set  of  vertices  representing  classes  {c,.}  and  A  is  a  set  of  edges 
representing  associations  between  classes,  each  of  which  is  denoted  by  A,. , 
for  an  association  between  classes  C{  and  Cy.   If  C£8G*,  then  C£SGkt  (Vtejj. 


The  condition  ensures  that  a  class  does  not  appear  in  two  different  connected 
graphs  in  a  subdatabase.  If  it  does,  the  two  connected  graphs  should  have  been  a 
single  connected  graph. 

Subdatabase  Object  (Association)  Graph: 

A  subdatabase  object  graph  (OGt(0,E))  contains  a  subset  of  association  pat- 
terns of  the  original  database  object  graph  (OG(OtE)),  where  O  is  a  set  of 
vertices  representing  object  instances  and  E  is  a  set  of  edges  representing 
associations  between  object  instances.  An  Inner-pattern  (or  object  instance 
°ij)  belongs  to  OGt  onlyfc  if  Ct£SG,  and  O-^C...  An  Inter-pattern  or  a 
Complement-pattern  (o,.J===OmJ  belongs  to  OG,  only  if  Ci:CmeSGt  and 
A-,m&SG„  where  Oi}eC{,  OmneCm,  and  O.  ===Om „GA,m. 

The  above  conditions  state  that  a  primitive  association  pattern  should  not  be 
included  in  OGt  if  the  corresponding  classes  and/or  associations  of  the  original 
database  are  not  in  SGt. 

Instead  of  proving  the  completeness  theorem  as  stated  above,  we  make  the 
following  observations  and  restate  the  theorem  as  shown  below. 

First,  although  the  SG  of  an  0-0  database  may  consist  of  more  than  one 
connected  graph,  it  suffices  to  prove  the  case  that  the  SG  is  a  single  connected 
graph  since  if  two  classes  do  not  have  a  path  between  them  in  the  SG,  they  will 
not  be  associated  with  each  other  in  any  of  the  subdatabases.  Therefore,  each 
connected  graph  of  SG  can  be  treated  as  an  independent  database  and  a  subdata- 
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base  defined  on  more  than  one  connected  graphs  of  SG  can  be  represented  by  the 
A-Union  of  the  subdatabases  defined  on  different  connected  graphs  of  SG. 

Second,  it  suffices  to  prove  the  case  that  a  subdatabase  consists  of  only  one 
connect  subgraph  of  SG,  although  in  general  the  SGt  of  a  subdatabase  may  con- 
tain more  than  one  subgraphs  of  SG.  This  is  because  the  general  case  can  be 
represented  by  the  A-Union  of  the  expressions  for  individual  subgraphs. 

Third,  since  an  O-O  database  is  a  collection  of  association  patterns,  it  should 
be  obvious  that  if  there  exists  an  A-algebra  expression  for  every  association  pat- 
tern of  an  O-O  database,  then  the  subdatabases  can  be  represented  by  the  A- 
Union  of  a  subset  of  these  association  patterns.  Therefore,  the  completeness 
theorem  can  be  restated  as  follows: 
Completeness  Theorem: 

The  A-algebra  is  complete  if  there  exists  an  expression  for  every  asso- 
ciation pattern  in  the  OG  of  an  O-O  database. 

We  prove  the  above  theorem  by  induction  on  the  number  of  object  instances 
in  an  association  pattern. 

Proof. 

Ra&zi  We  first  show  that  there  is  an  expression  for  the  case  that  an  association 
pattern  contains  a  single  object  instance.  Since  the  name  of  a  class,  say  Ov 
represents  all  the  object  instances  of  the  class,  an  association  pattern  containing  a 
single  object  instance  of  that  class  can  be  represented  by  an  A-Select  operation 
over  the  object  instances  of  Cl  to  select  a  particular  object  instance  of  interest,  as 
shown  below: 
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where  B  is  the  condition  an  object  instance  of  Cx  must  satisfy. 
Hypothesis:  Assume  that  there  exists  an  expression  for  every  association  pattern 
that  contains  n-l  object  instances.  These  n-l  object  instances  must  form  a  con- 
nected graph,  i.e.,  each  object  instance  must  be  at  least  one  path  between  any  two 
object  instances  in  the  graph.  Otherwise,  they  would  have  formed  multiple  associ- 
ation patterns. 

Induction:  Suppose  there  exist  an  expression  for  an  association  pattern  Pn_1  which 
contains  n-l  object  instances.  When  adding  the  nth  object  instance  to  this  pat- 
tern, a  new  pattern  Pn  containing  n  object  instances  can  be  formed  in  the  follow- 
ing two  ways  as  depicted  in  Figure  6.1:  (a)  the  nth  object  instance  belongs  to  class 
Ck  and  the  object  instances  of  Ck  do  not  participate  in  Pn_1;  and  (b)  the  nth  object 
instance  belong  to  a  class,  say  Cp  which  has  some  object  instance(s)  participated 
in  the  Pn~\  To  avoid  using  complicated  notation,  we  will  show  the  formulations 
for  two  specific  patterns  depicted  in  Figure  6.2a  and  6.2b,  which  correspond  to 
the  cases  of  Figure  6.1a  and  6.1b,  respectively.  Patterns  in  general  forms  can  be 
formulated  using  the  same  mechanism  described  below.  We  shall  discuss  cases  a 
and  b  in  turn. 

Case,  a:  When  adding  an  object  instance  of  C7  to  a  pattern  Pn  containing  11 
object  instances,  various  new  patterns  P^'s  can  be  formed  depending  on  the  asso- 
ciations between  the  new  object  instance  and  the  other  existing  object  instances. 
The  new  object  instance  can  only  have  one  association  with  an  existing  object 
instance  if  their  classes  are  directly  connected  in  SG  by  a  single  association  type 
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(we  will  consider  later  the  case  that  there  are  more  than  one  association  type 
between  two  classes).  There  are  only  three  possible  choices  for  the  new  object 
instance  to  relate  to  an  existing  object  instance:  (1)  the  association  is  of  no 
interest,  i.e.,  the  association  is  not  included  in  the  pattern;  (2)  they  are  associated 
with  each  other;  (3)  they  are  not  associated  with  each  other.  Graphically,  we  use  a 
solid  line  (an  Inter-pattern)  to  represent  choice  2  and  a  dashed  line  (a 
Complement-pattern)  to  represent  choice  3.  No  line  is  drawn  between  the  two 
object  instances  for  choice  1.  Note  that  at  least  one  of  the  associations  of  the  new 
object  instance  with  the  existing  object  instances  must  have  a  choice  of  2  or  3. 
Otherwise,  the  new  object  instance  and  P11  are  two  separate  patterns  that  should 
be  covered  by  the  base  and  the  hypothesis. 

To  formulate  an  expression  for  the  new  pattern  shown  in  Figure  6.2a,  we 
first  transform  pattern  P11  into  a  pattern  by  treating  object  instances  of  P11  as  if 
they  are  from  different  classes  by  using  the  aliasing  names  of  their  original  classes, 
as  shown  in  Figure  6.3a.  The  pattern  P™  in  Figure  6.2a  is  equivalent  to  the  pat- 
tern  Pla  in  Figure  6.3a  provided  that  the  object  instances  of  the  aliasing  classes  of 
the  same  class  are  not  the  same  object  instances.  Next,  the  equivalent  pattern  is 
decomposed  into  a  set  of  patterns,  each  of  which  is  a  subpattern  (i.e.,  subgraph)  of 
the  pattern  in  Figure  6.3a  and  consists  of  P11,  the  new  object  instance,  and  its 
relationship  with  one  object  instance  in  P11.  If  we  can  derive  expressions  for  these 
subpattern  individually,  the  A-Intersect's  of  these  expression  will  be  the  expression 
for  the  pattern  in  Figure  6.3a,  which  is  equivalent  to  the  pattern  in  Figure  6.2a. 
In  this  example,  the  pattern  in  Figure  6.3a  is  decomposed  into  six  subpatterns,  as 


** 
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shown  in  Figure  6.4a,  which  can  be  easily  expressed  as  follows: 
Epl2  =  (E^)  [[Btq-l.Cy]  C7; 

al 

Epl2  =  (E     )  *[R{CXJL,C7)]  C7; 

«2 

«-»  =  (£_»)  *tfl(Cj_l,C^)]  C7; 

oS 

E  12  =  {E  n)  Ar(cs-2>ct)\  Ci< 
Epl2  =  (E     )  \{R(Cb,C7)}  C7; 

•8 

£p12  =  (-Ell)    *[^(C8'C7)]    *3r> 


respectively.  Here,  E  stands  for  the  algebraic  expression  of  the  association  pattern 
specified  by  its  subscript.  In  each  expression,  an  operation  *  or  |  is  chosen 
corresponding  to  the  type  of  connection  between  object  instances,  and  E  „  is 

parenthesized  to  ensure  the  correct  execution  sequence. 

The  expression  for  the  pattern  of  Figure  6.2a  can  then  be  formulated  by  a 
sequence  of  A-Intersect  operations  on  the  expressions  of  these  individual  patterns: 

Epi2  =  -E  12  •  E  12  •  E  12  •  E  12  •  2?  .J  •  2?  12    . 

a  al  a2  08  «4  aS  oO 

Case,  h:  Figure  6.2b  depicts  the  case  that  the  new  object  instance  belongs  to  an 
existing  class  Ce  and  it  may  have  associations  with  object  instances  of  other  classes 
that  have  associations  with  C6.  The  formulation  for  the  new  pattern  pf  shown  in 
Figure  6.2b  can  be  obtained  similarly  as  depicted  in  Figure  6.3b  and  6.4b.  Note 
that  the  new  object  instance  belongs  to  the  aliasing  class  C^-2  after  the  pattern 
transformation  process  (see  Figure  6.3b).  As  shown  in  Figure  6.4b,  the  equivalent 
pattern  depicted  in  Figure  6.3b  is  decomposed  into  four  patterns  which  can  be 
expressed  by 
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Epl2  =  (E     )  ^(CVl.C^)]  C,_2; 
^  ,2  =  (*J  |[i2(C4_2,C6_2)]  C^; 

respectively. 

Therefore,  for  the  pattern  Pl£  we  have  expression 

^«B   =    Ep12   •    Epl2   »    ^„12   •    Ep12- 

b  *6i  rta  ria  *m 

However,  the  above  expression  does  not  exclude  the  case  that  two  object  instances 
in  aliasing  classes  of  C6  refer  to  the  same  object  instance.  Hence,  it  is  necessary  to 
perform  an  A-Select  operation  to  eliminate  such  case  and  we  have 

EPn  =  <KEpl2  •  £  .2  •  Epl2  •  BJhCrl+CfS). 

*  61  *62  rU  rH 

So  far  we  have  shown  that  there  exists  at  least  one  expression  for  a  pattern 
consisting  of  any  number  of  object  instances.  We  note  that  there  may  exist  more 
than  one  expression  for  a  pattern.  We  illustrate  this  by  showing  an  alternative 
way  of  transforming  a  pattern  into  an  equivalent  one  so  that  different  expressions 
can  be  derived. 

Figure  6.5a  shows  another  pattern  which  is  equivalent  to  the  pattern  in  Fig- 
ure 6.2a  if  in  Figure  6.5a  the  objects  instances  of  the  aliasing  classes  C7_l  through 
C7_6  that  participate  in  P™  refer  to  the  same  object.  Therefore,  we  have  an  alter- 
native expression  for  P^2 

E  -  =  o(...((E     )  K^Cj-l.CVJ)]  C7_l)  4R(C^,C7^)}  C7^ 
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•  •  •   *[rt(C8,C^JB)]  C7_6))[C7_1=C7_2=...=C7.j6]. 

which  is  a  sequence  of  *  and/or  |  operations  on  E  u  over  classes  C7_t",  (t'=l,2,...,6) 

and  their  associated  classes.    The  selection  condition  [C7_1=C7_2=...=C7_6]  ensures 
that  the  object  instances  in  all  aliasing  classes  of  C7  refer  to  the  same  object. 

Similarly,  the  pattern  in  Figure  6.5b  is  equivalent  to  the  pattern  in  Figure 
6.2b  if  the  object  instances  in  C9-2  through  C6_5  that  participate  in  pf  are  the 
same  object  and  this  object  is  different  from  the  one  in  Cg_l.  Hence  an  alternative 
expression  can  be  derived  as  follows 


Ep„  =  a{...((E^  •[«(C8_2,C,_2)]  C,_2)  *[fl(C4_l,C(r3)]  C^_3) 

•  •  ■   |[i2(C4_2,<V>)]  <76_5))[C6^=C6_3=C6_4=Cft_5^C8_l]. 

We  have  shown  that  there  exists  an  expression  for  every  association  pattern 
when  there  is  a  single  association  between  two  classes.  Now  we  prove  this  is  also 
true  when  there  are  more  than  one  association  between  two  classes.  There  are 
also  two  cases  as  described  in  the  proof  above.  We  only  prove  case  a  that  the 
new  object  instance  belongs  to  Ck  and  the  object  instances  of  Ck  do  not  partici- 
pate in  P"~ .  Case  b   can  be  proven  using  the  same  methodology. 

Figure  6.6a  shows  an  SG  in  which  there  are  two  associations  between  C^_x 
and  Ck.  The  two  associations  are  denoted  as  [/2,( <?_,._,, Cfc)]  and  [R2(Cj_uCk)},  respec- 
tively. Figure  6.6b  shows  a  pattern  in  which  the  new  object  instance  of  Ck  has 
two  associations  with  each  object  instance  of  Cj_r  The  associations  between 
object  instances  of  C^,  and  Ck  are  labeled  by  numbers  corresponding  to  the  asso- 
ciations of  their  classes.    To  derive  the  algebraic  expression  for  this  pattern,  first, 
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we  decompose  it  into  two  patterns,  P"  and  P%,  as  shown  in  Figure  6.6c.  The 
decomposition  is  done  by  making  two  copies  of  the  pattern.  In  one  copy  the  asso- 
ciations labeled  2  are  dropped  and  in  the  other  the  associations  labeled  1  are 
dropped.  From  the  earlier  discussion,  we  can  derive  expressions  for  these  two  pat- 
terns and  the  expression  for  the  original  pattern  can  be  represented  by  the  A- 
Intersect  of  the  two: 

E^  =  E  „•  E  „. 
p"        p"        p" 

a  b 

To  ensure  that  the  A-Intersect  operation  will  produce  the  pattern  as  required,  the 
same  object  instance  in  the  two  copies  should  use  the  same  aliasing  class  name 
when  expressions  E  „  and  E  n  are  formulated. 

a  b 

Generally,  if  the  new  object  instance  of  Ck  has  multiple  associations  with 
object  instances  of  several  classes,  the  association  pattern  is  decomposed  into  m 
patterns,  where  m  is  the  maximum  number  of  associations  Ck  has  with  another 
class. 

Since  it  has  been  shown  that  we  can  formulate  algebraic  expressions  for  all 
possible  patterns  in  which  object  instances  are  associated  or  nonassociated  and  the 
A-Union's  of  these  expressions  forms  a  single  expression  for  the  subdatabase  of 
interest,  we  have  shown  that  the  A-algebra  is  complete  by  induction.         □ 
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(a)  the  nth  object  is  in  Ck  (b)  the  nth  object  is  in  Cj 


Figure  6.1    Two  ways  of  forming  new  patterns 
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(a)  the  12th  object  is  in  C7  (b)  the  12th  object  is  in  C6 


Figure  6.2   Two  specific  examples  of  new  patterns 
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(a) 


(b) 


Figure  6.3  Equivalent  patterns 
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Figure  6.4   Decomposed  patterns 


131 


(a) 


(b) 


Figure  6.5   Other  equivalent  patterns 
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•    [R1(cj-rck>] 
^     [R2(Cj  _  1  .Ck  )] 


C; 


(a)   Two  classes  have  multiple     (b)  Two  objects  have  multiple  associations  in  a  pattern 
associations 


(c)   The  pattern  is  first  decomposed  into  two  patterns 
Figure  6.6   New  object  instance  having  multiple  associations  with  those  of  Cj_x 


CHAPTER  7 
CONCLUSION 


Object-Oriented  DBMSs  and  their  underlying  models  exhibit  several  desirable 
features  that  are  suitable  for  modeling  and  processing  complex  objects  found  in 
more  advanced  database  applications.  However,  they  still  do  not  have  a  solid 
mathematical  foundation.  Such  a  foundation  is  important  for  the  efficient  mani- 
pulation of  0-0  databases  and  for  the  design  of  high-level  query  languages  to  ease 
the  user's  task  in  accessing  and  manipulating  O-O  databases. 

In  this  dissertation,  we  have  presented  an  algebra  for  0-0  database  process- 
ing based  on  the  uniformed  representation  of  object  instances  and  their  associa- 
tions in  an  O-O  database:  association  patterns.  Nine  algebra  operators  have  been 
introduced  for  manipulating  patterns  of  both  heterogeneous  and  homogeneous 
structures.  The  closure  property  of  the  algebra  allows  the  result  of  an  algebraic 
expression  to  be  further  processed  by  the  algebra. 

Several  mathematical  properties  of  the  A-algebra  operators  have  been  studied 
and  formally  proven.  Their  utility  in  query  decomposition  and  optimization  has 
been  demonstrated.  The  A-algebra  is  complete  in  the  sense  that  all  possible  sub- 
databases  that  are  derivable  from  an  O-O  database  can  be  expressed  in  A-algebra 
expressions. 
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The  A-algebra  has  been  used  in  the  design  and  implementation  of  a  high- 
level  object-oriented  query  language,  OQL,  for  processing  O-O  databases 
[ALA89b,  WU89].  A  graphic  interface  for  the  language  and  a  prototype 
knowledge  base  management  system  based  on  the  0-0  semantic  association  model 
OSAM*  [SU86  and  SU89]  are  presented  in  [DS088,  TY88,  SU88,  LAM89,  PAN89, 
CHU90,  SIN90]. 
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APPENDIX 

The  formal  proofs  of  the  mathematical  properties  of  the  A-algebra  operators  are 
given  below: 
A.   Commutativity: 

(1)  a*[R{A,B)]P  =  P*[R{B,A)]<*  (5-1) 

Proof:  If  a  pattern  in  a  can  be  concatenated  with  a  pattern  in  P  over  an  Inter- 
pattern  a{bj,  then  the  pattern  in  P  can  be  concatenated  with  that  pattern  in  a  over 
the  Inter-pattern  fc^-a,..  Since  patterns  are  non-directional,  i.e.,  a{bj  =  bja{,  the  left- 
hand  side  and  the  right-hand  side  of  the  equation  would  produce  the  same  result.  On 
the  other  hand,  if  an  a  pattern  cannot  be  concatenated  with  a  ft  pattern  by  the 
operation  on  the  left-hand  side,  then  the  same  ft  pattern  cannot  be  concatenated  with 
that  a  pattern  by  the  operation  on  the  right-hand  side.     □ 

(2)  a\[R(A,B))P  =  P\[R(B,A))<*         (5.2) 

Proof:  Since  a  Complement-pattern  is  non-directional  and  if  a  complement  pattern 
a-6.  connects  an  a  pattern  with  a  ft  pattern,  these  two  patterns  together  with  the 
Complement-pattern  a,.fc.  will  all  be  retained  in  the  results  of  the  expressions  on  both 
sides  of  the  equation.  For  the  same  reason,  a  new  pattern  which  cannot  be  produced 
by  the  operation  on  the  left-hand  side  of  the  equation  cannot  be  produced  by  the 
operation  on  the  right-hand  side.     □ 
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(3)  &l[R(AMP  -  fl[*(BA)}<*         (5.3) 

Proof:  According  to  the  connections  between  patterns  of  a  and  P  through  some 
Inter-patterns,  a  and  P  can  be  decomposed  into  the  A-Union  of  two  subsets  of  pat- 
terns, respectively. 

in  t  it 

a)   a  =  a  +  a  and  P  =  P  +  P 

where  a  represents  a  subset  of  a  patterns  that  can  be  concatenated  with  the  P 

patterns  and  a   represents  a  subset  of  a  patterns  that  cannot  be  concatenated 

with  P  patterns.   The  decomposition  of  P  can  be  interpreted  similarly. 

Assume  that  a  P  and  p  a  are  used  to  denote  the  new  patterns  produced  by  the 

NonAssociate  operations  on  both  left-  and  right-hand  sides  of  the  equation.   Each  of 

the  new  patterns  consists  of  one  a  pattern,  one  P  pattern,  and  a  Complement-pattern 

which  connects  the  two.   By  the  definition  of  the  NonAssociate  operation,  we  have 

left-hand  side  =  {a  +  a)\[R{A,B)]{P  +  p) 

n  ll 

a  if  p  =<f> 

ft  ff 

P  if  a  =<f> 

a  R  otherwise 

right-hand  side  =  {p  +  ^")![i2(fl,A)](a  +  a) 

a  i[p=<j> 

II  II 

P  if  a  =4> 

R  a  otherwise 


It     It  H      It 


Since  a  Complement-pattern  is  non-directional,  i.e.,  a  p  =  P  a ,  the  commutativity 
holds  for  all  cases.     □ 
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(4)  a»{X}P  =  P»{X}a  (5.4) 

Proof:  If  the  Inner-patterns  (object  instances)  of  the  classes  specified  in  {X}  con- 
tained in  an  or  pattern  are  common  to  a  P  pattern,  the  new  pattern  which  is  the 
intersection  of  the  two  patterns  will  be  produced  by  both  sides  of  the  equation.  On 
the  other  hand,  if  an  or  pattern  which  does  not  intersect  with  a  P  pattern  by  the 
operation  on  the  left-hand  side  of  the  equation,  the  same  P  pattern  will  not  intersect 
with  that  a  pattern  by  the  operation  on  the  right-hand  side.     □ 

(5)  a+p  =  p+a  (5.5) 

Proof:  Since  the  A-Union  operation  simply  lumps  patterns  named  by  two  operands 
into  a  single  association-set  and  the  patterns  in  an  association-set  are  not  ordered, 
both  sides  of  the  equation  will  produce  the  same  result.     □ 


B.  Associativity 

(1)    (a{x}4R(CLvCL2)}p{Y))4R(CLz,CL<)}1{z} 

=  a{x}*[«(CL1>CL2)](^{y}*[fl(CL8lCL4)]7{^})  (5-6) 

CL&{X}A  CL£{Z). 
Proof:  The  associativity  holds  only  under  the  stated  condition.  The  condition  states 
that  a  does  not  contain  Inner-patterns  of  class  CLS  and  7  does  not  contain  Inner- 
patterns  (or  object  instances)  of  class  CL2  so  that  a  will  have  no  effect  on  the  opera- 
tion *[fi(CL8,CLi)]  on  the  left-hand  side  and  7  will  have  no  effect  on  the  operation 
*[R{CLVCL2)]  on  the  right-hand  side.  Given  that  the  above  condition  holds,  a,P  and 
7  can  be  decomposed  as  follows: 

r  it  m 

a)   a  =  a  +  a  +  a 
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where  a  represents  a  subset  of  a  patterns  which  can  be  concatenated  with  a 
subset  of  P  patterns  and  thereafter  be  concatenated  (through  P  patterns)  with  a 
subset  of  7  patterns,  a  represents  a  subset  of  a  patterns  which  can  be  con- 
catenated with  a  subset  of  P  patterns  which,  however,  cannot  be  concatenated 
with  any  7  pattern,  and  a  represents  a  subset  of  patterns  which  either  does  not 
have  the  Inner-patterns  of  CLl  or  cannot  be  concatenated  with  any  P  pattern. 
Note  that  an  a  pattern  may  belong  to  a  and  or . 

/  i  m  tin 

b)  p  =  p  +  p  +p  +p 

where  P  can  be  concatenated  with  a  and  7,  P  can  be  concatenated  with  a  but 

not  with  7,  P    can  be  concatenated  with  7  but  not  with  a,  and  P    cannot  be 

1     n     in  mi 

concatenated  with  either  or  or  p.    Note  that  patterns  at  p,  p,  p ,  and  P    are 

mutually  exclusive. 

1  11  in 

c)  7  =  7+7  +7 

1         n  m  %  ,  1  11  m 

where  7,  7,  and  7  have  the  similar  interpretations  as  a,  a,  and  a  ,  respec- 
tively. 

If  aft,  a  P ,  /?7,  P  7 ,  and  a £7  are  used  to  represent  the  results  of  the  Associate 
operations,  according  to  the  definition  of  Associate  we  have 


W  m 


left-hand  side  =  ((«+«  +  a  ) *\R(CLvCL2)\(p  +  P  +  P   +  P  )) 

t  ft  Iff 

*[i2(CL3,CL4)](7  +  7  +  7  ) 
1  1         a  n  1        a        m 

=  (ap  +  aP)*[R(CLvCL2)}(~i  +  7  +  7  ) 

/  /  / 
=  aPi 

1  n  m  1  11  in  m 

right-hand  side  =  {a  +  a  +a  ) *[R{CLVCL2)]{(P  +  P  +  P   +  P  ) 

t  ft  m 

*[JR(CL3,C7L4)](7  +  7  +  7  )) 

t  tr  m  iimn 

=  (a  +a  +a  )*[i2(CL1,CL2)](^7  +  P  7) 

/     /     I 

=  a/?7  □ 
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(2)   («w|[JR(CL1ICL2)]J9m)|[JR(CL3,CL4)]7{z}  (5.7) 

=  am  \\R{CLvCL2)\{P(y)  |[i2(CL3,CL4)]7{z}) 

CL&{X}A  CL&{Z}. 
Proof:    For  the  similar  reason  given  in  the  discussion  of  associativity  of  *  operator, 
a,  P,  and  7  can  be  decomposed  as  follows: 

i  n  m 

a)  at  =  at  +  at   +  at 

where  a  can  be  connected  to  P  patterns  by  Complement-patterns  and  then  be 
connected  to  7  patterns,  a    can  be  connected  to  P  pattern  by  Complement- 

m 

patterns  but  cannot  be  further  connected  to  7  patterns,  and  a  either  has  no 
Inner-patterns  of  CLl  or  cannot  be  connected  to  any  P  pattern  by 
Complement-patterns.  Also,  patterns  of  or  and  or  may  not  be  mutually 
exclusive. 

1  n  m  m 

b)  P=P+P    +P     +fi 

1  " 

where  P  can  be  connected  to  a  and  7  patterns  by  Complement-patterns,  ft  can 

in 
be  connected  to  a  patterns  by  Complement-patterns  but  not  to  7  patterns,  ft 

can  be  connected  to  7  patterns  by  Complement-patterns  but  not  to  a  patterns, 

and  P    cannot  be  connected  to  the  patterns  of  either  a  or  7.   Also,  patterns  of 

/        h        ttt  m 

P,  P ,  P  ,  and  P  are  mutually  exclusive. 

1  11  m 

c)  7  =  7+7  +7 

1        n  in  u  1         it  111 

where  7,  7,  and  7  have  the  similar  interpretations  as  a,  a,  and  a  ,  respec- 
tively. 


Then,  by  the  definition  of  the  A-Complement  operation,  we  have 

1  11  in  1  11  m  m 

left-hand  side  =  ({a  +  a  +a  )\{R{CLVCL2)}{P  +  P  +  P   +  P  )) 

|[i2(CL3lC7L4)](7  +  7  +  7  ) 
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/    /  tt    it 


it  m  mi 


=  (aP  +  a  p)\[R(CLvCL2)}(i  +  7  +  7 ) 

/     /     / 

right-hand  side  =  (a  +  a  +  a  )\[R{CLVCL2)}((P  +  P  +  P   +  P  ) 

f  ft  III 

|[J2(CL8)CL4)](7  +  7  +  7  )) 

t  It  III  I     I  Iff     It 

=  (or  +  a  +a  )\\R{CLvCL2)]{P~i  +  $  7) 

1  1  1 
=  a  £7 


f    /        11    n        11       m    n 


where   aP,  a  P ,  pq,  P  7 ,   and   a/?7   represent   the   results  of  the   A-Complement 
operations.     □ 

(3)   (<xm.{W1}P{Y}).{W2}~l{z}  =  aw^HW{y}.{Hy7{z})  (5.8) 

where  {Wl-W2}n{Z}=4>A{W2-Wl}n{X}=4>. 
Proof:  The  condition  ensures  that  the  operation  •{X}  operates  only  on  patterns  of  a 
and  P  and  »{Y}  operates  only  on  patterns  of  P  and  7.  The  following  figure  shows  four 
possible  cases  in  which  three  patterns  intersect  with  one  another.  It  should  be  clear 
that  the  associativity  does  not  hold  for  case  (d),  because  it  violates  the  condition,  i.e., 
the  second  A-Intersect  operation  operates  on  a  and  p.  When  the  condition  is  true, 
the  proof  is  similar  to  the  proofs  for  the  above  two  associative  properties;  i.e.,  by 
decomposing  a,  P,  and  7  accordingly. 


a 


T 
(a) 


a 


(b) 


a  a 


(c) 


(d) 


(4)   (a+/7)+7  =  «+(/?+7)         (5.9) 

Proof:    Since  the  A-Union  operation  simply  lumps  two  association-sets  into  one  and 
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the  patterns  in  a  set  are  not  ordered,  the  order  of  performing  A-Union  operations  on 
a  number  of  association-sets  will  have  no  effect  on  the  final  result.     □ 


D.   Distributivity 

(1)   a*[i2(A,£>)](0+7)  =  <x*[R(A,B)}P  +  a*\R(A,B)}l         (5-15) 
Proof:  First,  a,  P  and  7  can  be  decomposed  as  follows. 


ii  m 


a)  a  =  a  +  a  +  a 

where  a  can  be  concatenated  with  P,  a   can  be  concatenated  with  7,  and  a 
cannot  be  concatenated  with  either  P  or  7.    Note  that  an  a  pattern  may  belong 

r  rt 

to  a  and  a  . 

b)  P  =  P  +  P 

1  a 

where  P  can  be  concatenated  with  a  but  ft  cannot. 

c)  7  =  7+7 

1  n 

where  7  can  be  concatenated  with  a  but  7  cannot. 
By  the  definition  of  the  Associate  operation,  we  have 


it  m 


left-hand  side  =  (a  +  a  +  a  )*[R(A,B)](P  +  P  +7  +  7) 
/  /         a  1 
=  afi  +  «7 

right-hand  side  =  («*  +  a"  +  a)  *[R(A,Bj\(P  +  P)  +  (*'  +  «"  +  a  )*[R{A,B)< 

r    I  rr    r 

=  otfi  +0-7  □ 


w  m 


(2)    a\[R(A,B)](P+l)  =  <*\[R(A,B)]P  +  a\[R{A,B)]l  (5-16) 

Proof:   a,  P  and  7  can  be  decomposed  as  follows. 

1  n  hi 

a)   a  =  a  +  a  +  a 

1  a 

where  a  contains  patterns  that  are  connected  to  P  by  Complement-patterns,  a 

in 
contains  patterns  that  are  connected  to  7  by  Complement-patterns,  and  a   can- 
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not  be  connected  to  either  P  or  7  by  Complement-patterns.   Note  that  an  a  pat- 
tern  may  belong  to  a  and  a  . 

b)  P  =  P  +  P 

1  tr 

where  P  can  be  connected  to  a  by  Complement-patterns  but  P  cannot. 

c)  7  =  7+7 

/  It 

where  7  can  be  connected  to  a  by  Complement-patterns  but  7  cannot. 
By  the  definition  of  the  A-Complement  operation,  we  have 


h        i 


left-hand  side  =  (a  +  a  +  a  )\[R(A,B)](P  +  P  +7  +  7) 

/     /  ft     I 

=  aft  +  a  7 

1  11  m  1  11  1  it  m  in 

right-hand  side  =  (a  +  a  +a  )\[R(A,B)]{P  +  0)  +  {a  +  a   +  a  )\[R[A,B)]{l  +  l) 

It  HI 

=  aP  +  07  □ 


(3)   a»{X}(p+i)  =  a*{X}  +  a.{X}i         (5.17) 

Proof:   a,  P  and  7  can  be  decomposed  as  follows. 

/        a        in 

a)  a  =  a  +  a  +or 

where  a  intersects  with  P,  a  intersects  with  7,  and  a    does  not  intersect  with 
either  P  or  7.   Note  that  an  a  pattern  may  belong  to  a  and  or . 

b)  P=P  +  P 

1  a 

where  P  intersects  with  a  but  P  does  not. 

c)  7  =  7+7 

where  7  intersects  with  a  but  7  does  not. 

By  the  definition  of  the  A-Intersect  operation,  we  have 


it  m 


left— hand  side  =  (or  +  a   +  a  )%{X)(f5  +  fi  +7+7) 
/  /         a  i 
=  otP  +  a  7 

/  ft  tit  f  II  t  It  HI  I  II 

right— hand  side  =  (a  +  a   +  a  )«{X}(/?  +  /?)  +  (o?+a   +a  )«{X}(7  +  7 ) 
/  /         n  t 
=  otf)  +  or  7  □ 
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(4)  aw*\R{CLvCL2)]{P(Y).{W}1{z)) 

=  a4R(CLvCL2)}P{y)»{WUX\a{x)4R(CLvCL2)}-({z}         (5.18) 

(5)  a{X}\[R(CLvCL2)}(P{r).{W}l{z}) 

=  a\[R{CLvCL2)\P(Y).{\^X)am\[R(CLvCL2))1{z)         (5.19) 
The  above  two  distributive  properties  hold  when  the  following  conditions  are  true 

i)    CL2G{W}; 

ii)    Xnr=  XnW=  <j>;  and 
iii)   a  is  a  homogeneous  association— set. 

The  first  condition  ensures  that  the  operations  Associate,  A-Complement,  and 
NonAssociate  will  operate  on  the  common  class  of  P  and  7  as  shown  in  (a)  of  the  fol- 
lowing figure.  Otherwise,  the  distributions  of  these  operations  to  P  and  7  do  not 
make  sense  as  shown  in  (b)  and  (c).  The  second  condition  ensures  that  a  patterns 
must  not  intersect  with  any  pattern  of  either  P  or  7  so  that  the  •{X\j  W)  operations  on 
the  right-hand  sides  of  the  equations  will  examine  the  intersections  on  the  portions  of 
a  and  7  separately.  The  third  condition  ensures  that,  on  the  right-hand  sides  of  the 
equations,  only  those  patterns  that  have  the  same  a  pattern  will  intersect  and  be 
retained  in  the  result. 


"Y>& 


y 
P 


CL,  CL£ 


(a)  (b) 


We  shall  only  give  the  proof  of  5.18.    5.19  can  be  proved  using  the  same  tech- 
nique. 
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When  the  conditions  are  true,  a,  P  and  7  can  be  decomposed  as  follows. 


I  n  m 


a)   a  =  a  +  a  +  a   +  a 

where  a  can  be  concatenated  with  P  and  7,  a  can  be  concatenated  with  P  but 
not  with  7,  a    can  be  concatenated  with  7  but  not  with  0,  and  a    cannot  be 

turn  1111 

concatenated  with  either   P  or  7.Note   that  a,  a,  a,   and  a     are   mutually 


/  tt  ttt  tttt 


exclusive. 

t        n 

b)  p  =  p  +  p  +  p  +p 

t  tt 

where  P  can  be  concatenated  with  a  and  does  intersect  with  7,  P  can  be  con- 
catenated with  a  but  does  not  intersect  with  7,  £  cannot  be  concatenated  with 
a  but  does  intersect  with  7,  and  P    can  neither  be  concatenated  with  a  nor 

'  "  I"  mi  a  . 

intersect  with  7.   Note  that  /?,  P ,  P  ,  and  £    are  also  mutually  exclusive. 

t  tt  tit  m 

b)   7  =  7+7  +7    +7 

I  II  HI  llll  I  'I  I"  I'" 

where  7,  7,  7    and  7     have  the  similar  interpretations  as  P,  P ,  P  ,  and  P  , 

respectively. 

By  the  definition  of  the  operations  of  Associate  and  A-Intersect  we  have 


m  1111 


m  mi 


left-hand  side  =  (a  +  a  +  a    +a  ) *{R{CLvCL2)]{{p  +  P  +  P   +  P  ) 

t  h  ttt  m 

•{W}(7  +7+7    +  7  )) 

1  11  111  nil  1    1  m    m 

=  («+«  +a    +a  )*{R(CLvCL2)}(p~i  +  P  7  ) 


t    m  ttt    t  -. 

Since  CL2g{W},  £7    and  £7  cannot  be  produced  by  the  «{XJ  operator  according  to 

mm.  . 

the  decompositions  of  P  and  7.  Otherwise,  7  (or  P  )  must  contain  the  same  Inner- 
pattern  of  CL2  as  contained  in  P  (or  7)  and  must  be  able  to  concatenate  with  a. 
Applying  the  distributive  property  5.15,  we  obtain 
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/  iii  hi  m 

=  a  4R{CLvCL2)]Pi  +  a  *[R{CLVCL2)]P  7 

n  1  1         a  in  in 

+  a  *\R{CLvCL2)\Pi  +  a  *[R(CLVCL2)}P  7 

in  1  1        in  in  in 

+  a  4R{CLvCL2)}Pi  +  a  *[R{CLVCL2)]P  7 

1111  1    1  mi  m    m 

+  a   4R{CLvCL2)\Pi  +  a   *{R(CLVCL2)]P  7 

Based  on  the  decompositions  of  a,  P,  and  7,  only  the  first  item  will  produce  new  pat- 
terns  and  is  retained.   Hence, 

=  a  *{R(CLvCL2)]Pi 

1  1  1 
=  ar/?7 

On  the  right-hand  side  of  the  equation  we  have 

1  11  m  mi  1  11  in  tin 

right-hand  side  =  ({a  +  a  +  a    +  a  ) *[R{CLVCL2)]{P  +  P  +  P   +  P  )) 

i  11  111  mi  1  n  in  mi 

.{XuW}((a  +  a   +  a    +  a  ) *\R{CLVCL2)]{^  +7+7    +  7  )) 
<  <         1  n         n  1         a  n  11         1  n         in  1         in  n 

=  (aP  +  aP  +  a  P  +  a  P)»{X[jW}(ai  +  «7  +07  +  07) 

Applying  the  distributive  property  5.18,  we  have 

/    /  /    /  11  1    11  11  mi  11  111    11 

right-hand  side  =  aP»{X\jW}a~i  +  aP*{X\jW}ai  +  ap*{X\jW}a  7  +  aP*{X\jW)a  7 

in  11  1    11  1    11  1    n  mi  in  in    11 

+  a/?.{XuW}«7  +  a P •{XuW}ai  +  aP»{Xl)W}a  7  +  aP»{X\j W}a  7 

n    1  11  n    1  in  n    i  mi  n    i  ill    n 

+  a  P»{X\J W)aq  +  a  P»{X\jW]aq  +  a  P*{XjW}a  7  +  a  P»{X[J W}a  7 

tt    tt  it  rt    tr  1    n  »    tt  ttt    1  if    if  tn    it 

+  a  P •{X\jW}a^  +  ap»{X\jW}ai  +  ap»{X\jW)a  7  +  ap»{X\jW]a  7 

Of  the  sixteen  items,  only  the  first  one  is  retained.  The  rest  of  items  are  dropped 
because  they  do  not  intersect  either  over  classes  in  {X}  or  over  classes  in  { W}.  There- 
fore, 

right— hand  side  =  aP»{X\jW}aq 

1  1  1 
=  aPi  □ 
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E.   Other  Properties 


(1)  <r1K(«)[/>2])[/>]  =  oyeM'Jlftl  =  "Wi^l        (5-20) 

Proof:  «  can  be  decomposed  into  a  +  a  +  a   +  a  ,  where  a  satisfies  Px  and  P2,  a 
only  satisfies  Pv  a   only  satisfies  P2,  and  a    does  not  satisfy  either  Pl  or  P2. 

^K(«)[^])[^2]  =  ayt*  +  <*V2]  =  «' 

oM^AJU  =  a  n 

(2)  i7(  o(ar)[^)[£;TI  =  *(  «(a)[f;TI)[^      (/t£)  (5.27) 

Proof:  First,  a  is  decomposed  into  a  +  a ,  where  a  satisfies  the  selection  condition 
but  a  does  not.  Then,  let  fi  and  fi  represent  the  results  of  the  projection  operation 
corresponding  to  a  and  a ,  respectively.  Since  PCS,  fi  satisfies  P  but  fi  does  not  and 
we  have 

1^  a{a)[Pi)[£it\  =  U(a)ie;T\  =  fi 

o{  JJ(«)[^TDW  =  o[P  +  fi)  =  fi        D 

(3)  o(o  *[R{A,B)\  fi)[P^P2)  =  vMW  *{R(*M  a2(fi)[P2]         (5.28) 

where  P1  and  P2  are  applicable  to  or  and  fi,  respectively. 
Proof:  First,  a  is  decomposed  into  a  +  or  +  or    +  a  ,  where  a  and  a  satisfy  Px  but 
a    and  a    do  not;  and  a  and  a    can  be  concatenated  with  some  fi  patterns  but  a 
and  a    do  not.    fi  can  be  decomposed  into  fi  +  fi  +  fi   +  fi    with  a  similar  interpre- 
tation.  Therefore,  we  have 


i    1  i    m  m    i  hi    m 


a{a  *[R{A,B)}  fi)[P^P2\  =a(afi  +  afi   +a  fi  +  a  fi  )[P^P2] 

I     t 

=  afi 

ffl(<*)[/y  *[r{a,b))  ajfihK  =  (<*'  +  «")  4XAM  (/*'  +  ^) 

=  a£  □ 
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(4)   o[a  4«(A,B)]  P)\Pl\/P2]  =  <**)W]  *MA,B)]  £  +  «  *[«(^,B)]  o{P)[P2]  (5.31) 

where  />,  and  P2  are  applicable  to  «  and  P,  respectively. 

Proof:  a  and  P  are  decomposed  as  in  the  above  proof.   Thus,  we  have 

ii        i  m        m  i        m  m 
o{a  *[R(A,B)}  P)[PlwP2]  =  o[<*P  +  «£   +  <*  P  +  a  p  )[PxyP2\ 

ii  i    m  mi 

=  afi  +  aft    +  a  P 
a(a)[P^  *\R{A,B)\  p+a  *[R(A,B)}  o(p)[P2} 

i  ii  i  ii  m  mi  i  ii  m  im  i  ll 

=  (a  +a  )4R(A,B)}  (P  +  p+p+p)  +  (a+a+a+a  )*[R(A,B)](P  +  fi) 

ii  i    m  mi 

=  aP  +  aP    +  a  P  0 


m        mi 


(5)  a{a  -  p)[P\  =  a{a)[P]  -  p  (5.34) 

Proof:  We  decompose  a  into  a  +  a  +  a  +  a  ,  where  a  and  or  satisfy  P  but  a  and 
a    do  not;  and  a  and  a   contain  P  patterns  but  a  and  a    do  not.   Then,  we  have 

o{a  -  p)[P]  =  a{a  +  a  )[P\  =  a 

o{a){P\  -  p  =  (a'  +  a)  -  P  =  a  a 

(6)  o[a  +  m  =  o{a)[P]  +  a{p)[P]  (5.35) 

/  n  i  a 

Proof:  Suppose  a  and  P  are  decomposed  into  subsets  a  and  or  and  P  and  P ,  respec- 
tively, where  a  and  p  satisfy  P  but  a  and  /?  do  not.  By  the  definition  of  A-Select 
operation,  we  have 

a{a  +  p)[P\  =d  +  P  =  o{a){P\  +  a{p)[P]  □ 

(7)  a{a  +  p)[PlyP2)  =  a^afP,]  +  a2(p)\P2]  (5.36) 

where  Pi  and  P2  are  applicable  to  a  and  P,  respectively. 

t  a  i  a 

Proof:  Suppose  a  and  P  are  decomposed  into  subsets  a  and  or  and  P  and  /9 ,  respec- 

i  a  i  n 

tively,  where  a  satisfies  P\  but  a  does  not  and  P  satisfies  P2  but  P  does  not.   By  the 

definition  of  A-Select  operation,  we  have 

o(a  +  0[/>v/y  =  a  +P=  ff,(a)[^]  +  'sOTO  D 
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(8)  II(a  +  #[£71  =  77(a)[£;7]  +  n{p)[£;^  (5.37) 

Proof:  Suppose  that  a  and  P  are  decomposed  into  subsets  a  and  a  and  £  and  p, 
respectively,  where  a  and  £  contain  subpatterns  defined  by  [£;7]  but  a  and  £  do 
not.  The  results  of  the  two  A-Project  operations  on  a  and  ft  are  represented  by  a 
and  P,  respectively.  By  the  definition  of  A-Project  operation,  we  have 

II(a  +  p)[£;l]  =  a  +  P  =  m.<*)\€\T\  +  H(#[£;7J  a 

(9)  (a +  0-7  =  (a -7) +  (0-7)  (5.40) 

Proof:  a  and  /?  are  decomposed  into  subsets  a  and  o  and  P  and  P ,  respectively, 
where  a  and  P  contain  7  patterns  but  or  and  P  do  not.   Thus,  we  have 

(a  +  p)  -  7  =  a"  +  /  =  (a  -  7)  +  {P  -  7)  D 

(10)  a+w03  +  7)  =  ar+w/?+w7         (5.41) 

Proof:  By  the  definition  of  the  A-Divide  operation,  on  the  left-hand  side  of  the  equa- 
tion, an  a  pattern  will  be  retained  in  the  result  if  (a)  it  has  Inner-patterns  of  classes 
in  {W)  and  contains  all  patterns  of  P  and  7,  or  (b)  the  Inner-patterns  of  classes  in  { W} 
that  an  a  pattern  has  are  common  to  some  other  a  patterns  and  these  patterns 
together,  denoted  by  a ,  contain  all  patterns  of  P  and  7. 

An  a  pattern  (or  patterns  in  a)  which  is  retained  on  the  left-hand  side  of  the 
equation  will  be  retained  after  the  first  A-Divide  operation  on  the  right-hand  side 
since  it  must  contain  all  the  P  patterns.  It  will  also  be  retained  in  the  final  result 
after  the  second  A-Divide  operation  since  it  must  contain  all  the  7  patterns.  □ 


155 

(11)  («{x}  *[-R(AB)]  P{Y})  \[R(C,D)}  1{z)  =  aw  *{R(A,B»  [fi{Y)  \[R(C,D)}  7{z})  (5-42) 

C£  {X}  and  B£  {Z} 
Proof:  P  is  decomposed  into  P  +  P  +  P   +  P    +  P  ,  where  P  and  P   can  be  con- 
catenated  with  a  patterns  but  P   and  P    cannot;  P  and  P    can  be  concatenated  with 

r  tft  w/r 

7  patterns  by  Complement-patterns  but  P  and  £  cannot;  and  P  can  be  neither  con- 
catenated with  a  patterns  nor  concatenated  with  7  patterns  by  Complement  pat- 
terns,   a  is  decomposed  into  a  +  a ,  where  a  can  be  concatenated  with  P  patterns 

ft  in 

but  a  cannot.  7  is  decomposed  into  7  +  7  with  a  similar  interpretation.  Thus,  we 
have 


/  /        1  a 


It     t  Hit     I 


(a  *\R(A,B)\  p)  \{R(B,C)}  7  =  (<*P  +  ap)\[R{C,D)]l 

1  a  1 
=  atPi 

a{X}  4R{AM  (P{Y)  \[R(C,D)\  1{z))  =  a*[i2(A,fl)](^7  +  Pi) 

1  n  1 
=  a/?7  n 

(12)   (ow  4*(A,B)]  /?m)  -  7{z}  =  («W  -  -W  *{«{A,B)3  /?{y}       ({Y>n(^}  =  *)  (5-43) 

=  ow  4«(A,fl)]  (P{Y)  -  1{Z])       ({X}f^Z}  =  <t>) 

Proof:  We  shall  prove  the  first  case.  The  second  case  can  be  proved  similarly,  a  is 
decomposed  into  a  +  a  +  a  +  a  ,  where  a  and  a  can  be  concatenated  with  p  pat- 
terns  but  a  and  a  cannot;  and  a  and  a  contain  7  patterns  but  a  and  a  do  not. 
£  is  decomposed  into  P  +  P ,  where  P  can  be  concatenated  with  a  patterns  but  P 
cannot.  Since  {Y}f^Z)=<j>,  none  of  P  patterns  contains  a  7  pattern  and  we  have 


11  hi 


(«w  *[R(A,B)}  P{Y))  -  7W  -  {afi  +  afi)  -  7 

=  a£ 

(«{*}  -  7{z})  4«(*.*)]  £{r}  =  («"  +  «)  *[«(^B)1  tfi  +  P) 

ft     t 

=  a  P  a 
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(13)  (aw  *[i2(A,B)]  p{y))  .  7{z>  =  («{*}  •  7{z})  ARAM  P{y)         (5-44) 

({Y}H{Z}  =  *AAe{X}) 

=  «W  4«(A,BM  (P{Y}  •  7{z}) 
({JQH{Z}  =  <j>ABE{Y}) 
Proof:  We  only  give  the  proof  of  the  first  case.    The  decompositions  of  a,  P,  and  7 

t  n  in  tin  1  it  ( 

are  as  follows:    a  =  a+a+a+a,  where  a  and  a   can  be  concatenated  with  p 

at  tin  1  hi  ,  .  ,     "       , 

patterns  and  a    and  a    cannot,  and  or  and  a    intersect  with  7  patterns  and  «   and 

it  it  ■  ff  §  ft 

a  do  not;  P  =  P  +  P ,  where  /?  can  be  concatenated  with  a  patterns  and  P  cannot; 
^  =  7  +  7,  where  7  intersects  or  patterns  and  7  does  not.  When  {Y}f^Z}=<l>,  pat- 
terns of  P  and  7  do  not  intersect  with  each  other  and  we  have 

(«w  *\R(A,B))  p{Y))  .  7{z}  =  (ap*  +  «V)  •  (7  +  7) 

f     /     / 
1    t  ttt    t  r  ff 

(«W  •  7{z})  ^(A^)]  £{r}  =  («7  +  «  7)  *[*(A,B)]  (/?  +  0) 

1  1  1 
=  a/?7  □ 

Note  that  the  left-hand  side  of  5.44  is  in  a  distributive  form  of  *  with  respect  to  •  but 
the  distributive  property  cannot  be  applied  because  it  requires  that  A  be  in  both  a 
and  P  and  7  be  a  homogeneous  association-set. 


(14)   a  \[R(A,B)}  (p  +  7)  (5.48) 

=  a![J2(A,fl)]0-iJ(ar*[J2(A(fl)]7)[a]  +  a\[R(A,B)}l-n{a*[R(^M^[a} 
where  a,  /?,  and  7  are  homogeneous  association-sets. 

;  11  111  mi  1 

Proof:  a  can  be  decomposed  asa  =  a+a+a  +  a  ,  where  a  can  be  concatenated 
with  P  by  Inter-patterns  but  not  with  7;  a  can  be  concatenated  with  7  by  Inter- 
patterns  but  not  with  P,  a  can  be  concatenated  with  both  a  and  P  by  Inter-patterns; 
and  a     cannot  be  concatenated  with  P  and  7.    a,  a ,  a  ,  and  a      are  mutually 
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exclusive.    P  is  decomposed  into  p  +  P ,  where  P  can  be  concatenated  with  a  but  P 
cannot.   7  can  be  decomposed  as  p. 

By  the  definition  of  the  NonAssociate  operation  we  have 


in  tm 


left-hand  side  =  {a  +  a   +  a    +  a  )  \[R{A,B)\  (P  +  P  +  7+7  ) 


a  7 

if  P=<j> 

m    n 

a  p 

if  7'W 

II               II 

P  +7 

llll 

if  a  =4> 

11 

1 

llll           II 

if  a  =P  =(j> 

11 

P 

llll           It 

if  a  =7  =<£ 

nti 

if  P=1=t 

mi    n 

a  p  + 

tm    it 

a  7 

otherwise 

/    I  III    I 


Since    a*[R(A,B)\P  =  aP  +  a  P,    we    have    n{a *{R{A,B)}p)[a]  =  a  +  a  .    Similarly, 
n{a  #lR(A,B)}7)[a}  =  a  +  a  .   Therefore,  on  the  right-hand  side  we  have 


a\[R{A,B)\P  -(*+«)- 


ff  w 


a![/2(A,B)]7  -  (a  +  a  )  = 


if  P=4> 

m 
if  a  =<£ 

^    a  p 

otherwise 

HII 

a 

if  7=«A 

ft 

7 

MP 

if  a  =0 

tm  a 
*   a  7 

otherwise 

158 


Hence, 


n  tit 


right-hand  side  =  a\[R{A,B)\P  -  (a  +  a  )  +  a\[R{A,B)]n  -  (or   +  a  ) 


a\[R{A,B)\P  -  [a  +  a  ) 

II  III 

+  a\[R{A,B)]i  -  {a  +  a  ) 


int    n 

a  7 

lilt      II 

a  p 

if  p=4> 
if  7=0 

ii            n 

P  +  7 

lltl 

if  a  =4> 

II 

7 

mi         n 

if  a  =£=0 

ll 

ft 

tin        ii 

if  a  =7  =^ 

tin 

a 

\{  p=n=4> 

m   n 

a  p  + 

mt    n 

a  7 

otherwise 

a 

(15)    a-(P  +  1)  =  a-P-1  (5.51) 

Proof:  By  the  definition  of  A-Difference  operation,  the  left-hand  side  of  the  equation 
retains  a  patterns  that  do  not  contain  any  pattern  of  P  or  7.  On  the  right-hand  side, 
the  first  A-Difference  operation  retains  or  patterns  that  do  not  contain  any  P  pattern 
and  then  the  second  operation  retains  a  patterns  that  do  not  contain  any  pattern  of  P 


or  7.     D 
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